WO2021082635A1 - 一种关注区域检测方法、装置、可读存储介质及终端设备 - Google Patents

一种关注区域检测方法、装置、可读存储介质及终端设备 Download PDF

Info

Publication number
WO2021082635A1
WO2021082635A1 PCT/CN2020/109068 CN2020109068W WO2021082635A1 WO 2021082635 A1 WO2021082635 A1 WO 2021082635A1 CN 2020109068 W CN2020109068 W CN 2020109068W WO 2021082635 A1 WO2021082635 A1 WO 2021082635A1
Authority
WO
WIPO (PCT)
Prior art keywords
eye
image
eye image
target face
coordinate point
Prior art date
Application number
PCT/CN2020/109068
Other languages
English (en)
French (fr)
Inventor
张�成
王杉杉
胡文泽
王孝宇
Original Assignee
深圳云天励飞技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术股份有限公司 filed Critical 深圳云天励飞技术股份有限公司
Publication of WO2021082635A1 publication Critical patent/WO2021082635A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • This application belongs to the field of image processing technology, and in particular relates to a method, device, computer-readable storage medium, and terminal equipment for detecting a region of interest.
  • the embodiments of the present application provide a method, device, computer-readable storage medium, and terminal device for detecting an area of interest to solve the problem that the existing method for detecting an area of interest is very expensive and difficult to be widely used.
  • the first aspect of the embodiments of the present application provides a method for detecting a region of interest, which may include:
  • the eye area of interest is determined according to the position information of each eye key point in the head posture, the line of sight direction, the left-eye image and the right-eye image.
  • the determining the eye area of interest according to the position information of each key point of the eye in the head posture, the line of sight direction, the left-eye image and the right-eye image includes:
  • the screen area corresponding to the output coordinate point is determined as the eye focus area.
  • the method further includes:
  • Each calibration sample set is constructed separately, where the sth calibration sample set includes F S vector samples, each vector sample is the input information vector when the sth screen area is paid attention to, and the label corresponding to each vector sample is Is the center coordinate point of the s-th screen area, 1 ⁇ s ⁇ SN, F S is a positive integer;
  • each calibration sample set includes:
  • the input information vector of each frame of sample image is constructed as the s-th calibration sample set.
  • the method further includes:
  • the reference coordinate point is an output coordinate point corresponding to K frames of face images collected before the target face image, and K is a positive integer;
  • the calculating the head posture of the face image includes:
  • the head posture is calculated according to the rotation matrix.
  • the extracting the left-eye image and the right-eye image in the target face image includes:
  • the left-eye image is extracted from the first area, and the right-eye image is extracted from the second area.
  • a second aspect of the embodiments of the present application provides a device for detecting a region of interest, which may include:
  • the image acquisition module is used to acquire the target face image to be detected
  • a head posture calculation module for calculating the head posture of the target face image
  • An eye image extraction module for extracting left eye images and right eye images in the target face image
  • the eye key point detection module is used to perform eye key point detection in the left eye image and the right eye image to obtain the key points of each eye in the left eye image and the right eye image. location information;
  • a line-of-sight direction determining module configured to determine the line-of-sight direction according to the head posture, the left-eye image and the right-eye image;
  • the eye area of interest determination module is configured to determine the eye area of interest based on the position information of each key point of the eye in the head posture, the line of sight direction, the left-eye image and the right-eye image.
  • the eye attention area determination module may include:
  • the input information vector construction sub-module is used to construct the position information of each eye key point in the head posture, the line of sight direction, the left-eye image and the right-eye image as an input information vector;
  • the neural network processing sub-module is used to process the input information vector using a pre-trained fully connected neural network to obtain output coordinate points;
  • the eye area of interest determination sub-module is configured to determine the screen area corresponding to the output coordinate point as the eye area of interest.
  • the device for detecting a region of interest may further include:
  • the screen area division module is used to divide the preset screen into SN screen areas, where SN is an integer greater than 1;
  • the calibration sample set construction module is used to construct each calibration sample set separately.
  • the s-th calibration sample set includes F S vector samples, and each vector sample is the input information vector when the s-th screen area is paid attention to, and The label corresponding to each vector sample is the center coordinate point of the s-th screen area, 1 ⁇ s ⁇ SN, and F S is a positive integer;
  • the network training module is used to train the fully connected neural network in the initial state by using each calibration sample set to obtain the pre-trained fully connected neural network.
  • the calibration sample set construction module may include:
  • a pattern display sub-module for displaying a preset pattern at the center coordinate point of the s-th screen area of the screen
  • the sample image collection sub-module is used to separately collect sample images of each frame, where the sample images are the face images when the subject pays attention to the pattern;
  • the input information vector construction sub-module is used to construct the input information vector of each frame sample image
  • the calibration sample set construction sub-module is used to construct the input information vector of each frame of sample image into the s-th calibration sample set.
  • the eye attention area determination module may further include:
  • the reference coordinate point obtaining sub-module is configured to obtain a reference coordinate point, where the reference coordinate point is an output coordinate point corresponding to K frames of face images collected before the target face image, and K is a positive integer;
  • the first coordinate point set construction sub-module is configured to construct the output coordinate point corresponding to the target face image and the reference coordinate point as a first coordinate point set;
  • the outlier elimination sub-module is used to eliminate outliers from the first coordinate point set to obtain a second coordinate point set;
  • the smoothing sub-module is used to calculate the mean coordinate point of the second coordinate point set, and determine the mean coordinate point of the second coordinate set as the output coordinate point after smoothing.
  • the head posture calculation module may include:
  • a rotation matrix calculation sub-module configured to calculate the rotation matrix of the target face image according to the affine matrix
  • the head posture calculation sub-module is used to calculate the head posture according to the rotation matrix.
  • the eye image extraction module may include:
  • the face key point detection sub-module is used to perform face key point detection in the target face image to obtain position information of each face key point in the target face image;
  • the eye area determination sub-module is used to determine the first area of the left eye image in the target face image according to the position information of each key point of the face, and the right eye image in the target face image The second area in
  • the eye image extraction sub-module is configured to extract the left eye image from the first area and extract the right eye image from the second area.
  • the third aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any of the above-mentioned region-of-interest detection methods are implemented .
  • the fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor executes the computer program, Steps for implementing any of the above-mentioned methods for detecting a region of interest.
  • the fifth aspect of the embodiments of the present application provides a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the steps of any of the above-mentioned region-of-interest detection methods.
  • the embodiment of the application obtains the target face image to be detected; calculates the head posture of the target face image; extracts the target face image Left-eye image and right-eye image; perform eye key point detection in the left-eye image and the right-eye image, respectively, to obtain the position of each eye key point in the left-eye image and the right-eye image Information; Determine the line of sight direction according to the head posture, the left eye image, and the right eye image; According to each of the head posture, the line of sight direction, the left eye image and the right eye image The location information of the key points of the eye determines the eye area of interest.
  • the location information can be used to determine the eye area of interest, which greatly reduces the cost and can be used in a wider range of applications.
  • FIG. 1 is a flowchart of an embodiment of a method for detecting a region of interest in an embodiment of the application
  • Fig. 2 is a schematic flow chart of calculating the head posture of a target face image
  • Fig. 3 is a schematic flow chart of extracting a left-eye image and a right-eye image in a target face image
  • Figure 4 is a schematic diagram of key points of a human face
  • Figure 5 is a schematic diagram of the key points of the eye
  • FIG. 6 is a schematic flow chart of determining the eye area of interest according to the position information of each key point of the eye in the head posture, the line of sight direction, the left eye image and the right eye image;
  • Figure 7 is a schematic flow chart of smoothing the output coordinate points
  • Figure 8 is a schematic flow chart of training a fully connected neural network
  • Fig. 9 is a schematic diagram of dividing the screen into several screen areas
  • Figure 10 is a schematic flow chart of constructing a calibration sample set
  • FIG. 11 is a structural diagram of an embodiment of a device for detecting a region of interest in an embodiment of the application.
  • FIG. 12 is a schematic block diagram of a terminal device in an embodiment of this application.
  • an embodiment of a method for detecting a region of interest in an embodiment of the present application may include:
  • Step S101 Obtain a target face image to be detected.
  • the execution subject of the embodiment of the present application may be a terminal device with a camera and a screen, including but not limited to a desktop computer, a notebook, a palmtop computer, a smart phone, and a smart TV.
  • the terminal device may collect the current face image through the camera facing the user, that is, the target face image .
  • Step S102 Calculate the head posture of the target face image.
  • step S102 may specifically include the process shown in FIG. 2:
  • Step S1021 Calculate the affine matrix of the target face image.
  • 3DMM three-dimensional deformable model
  • 3DMM is a three-dimensional reconstruction model of a human face.
  • the key to obtaining the corresponding three-dimensional face information from the face information is to determine the mapping relationship between the two-dimensional face information and the three-dimensional face information. This mapping relationship can be expressed in the form of a matrix, that is, an affine matrix.
  • the first three columns of Matrix are linear transformation matrices, ri j is the element in the i-th row and j-th column of the linear transformation matrix, 1 ⁇ i ⁇ 3, 1 ⁇ j ⁇ 3, and the last column of Matrix is the translation transformation vector , T k is the k-th element of the translation transformation vector, and 1 ⁇ k ⁇ 3.
  • Step S1022 Calculate the rotation matrix of the target face image according to the affine matrix.
  • first, the first vector R 1 and the second vector R 2 can be extracted from the affine matrix according to the following formula:
  • the first vector and the second vector can be normalized separately according to the following formula:
  • M is the rotation matrix
  • Step S1023 Calculate the head posture according to the rotation matrix.
  • the head posture can be calculated according to the following formula:
  • x is the pitch angle of the head posture
  • y is the deflection angle of the head posture
  • z is the roll angle of the head posture
  • Step S103 Extract a left eye image and a right eye image in the target face image.
  • step S103 may specifically include the process shown in FIG. 3:
  • Step S1031 Perform face key point detection in the target face image to obtain position information of each face key point in the target face image.
  • the key points of the human face include, but are not limited to: key points of eyebrows, eyes, nose, mouth, facial contours and other parts.
  • Figure 4 shows a schematic diagram of the key points of a human face, where the number 0 to 16 are the key points of the facial contour, the numbers 17 to 21 are the key points of the left eyebrow, and the numbers 22 to 26 are the right eyebrows.
  • the key points of the nose, numbers 27 to 35 are the key points of the nose, the numbers 36 to 41 are the key points of the left eye, the numbers 42 to 47 are the key points of the left eye, and the numbers 48 to 67 are the key points of the mouth.
  • 3DMM may be used to perform face key point detection in the target face image, so as to obtain position information of each face key point in the target face image.
  • Step S1032 Determine the first area of the left-eye image in the target face image and the second area of the right-eye image in the target face image according to the position information of each key point of the face.
  • numbers 36 to 41 are the key points of the left eye
  • numbers 42 to 47 are the key points of the left eye.
  • the minimum value of the abscissa of the left eye is recorded as left_x_min
  • the abscissa of the left eye The maximum value is recorded as left_x_max
  • the minimum value of the ordinate of the left eye is recorded as left_y_min
  • the maximum value of the ordinate of the left eye is recorded as left_y_max
  • the minimum value of the abscissa of the right eye is recorded as right_x_min
  • the maximum value of the abscissa of the right eye Denoted as right_x_max
  • the minimum value of the ordinate of the right eye is denoted as right_y_min
  • the maximum value of the ordinate of the right eye is denoted as right_y_max.
  • a rectangular area (denoted as LA1) formed by the following four coordinate points can be used as the first area: (left_x_min, left_y_max), (left_x_min, left_y_min), (left_x_max) , Left_y_max), (left_x_max, left_y_min), the rectangular area (denoted as RA1) formed by the following four coordinate points is used as the second area: (right_x_min, right_y_max), (right_x_min, right_y_min), (right_x_max, right_y_max) , (Right_x_max, right_y_min).
  • LA1 can also be expanded to obtain a new rectangular area LA2, Use LA2 as the first area; expand RA1 to obtain a new rectangular area RA2, and use RA2 as the second area.
  • the coordinates of the four vertices of LA2 are respectively recorded as: (left_x_min_new, left_y_max_new), (left_x_min_new, left_y_min_new), (left_x_max_new, left_y_max_new), (left_x_max_new, left_y_min_new), then:
  • left_x_min_new left_x_min-p ⁇ (left_x_max-left_x_min);
  • left_x_max_new left_x_max+p ⁇ (left_x_max-left_x_min);
  • left_y_min_new left_y_min-p ⁇ (left_y_max-left_y_min);
  • left_y_max_new left_y_max+p ⁇ (left_y_max-left_y_min);
  • right_x_min_new right_x_min-p ⁇ (right_x_max-right_x_min);
  • right_x_max_new right_x_max+p ⁇ (right_x_max-right_x_min);
  • right_y_min_new right_y_min-p ⁇ (right_y_max-right_y_min);
  • right_y_max_new right_y_max+p ⁇ (right_y_max-right_y_min);
  • p is the preset expansion coefficient, and its specific value can be set according to the actual situation.
  • it is preferably set to 1.4, that is, the width and height of the final eye image are the original eye image widths. , 1.4 times higher. In this way, the phenomenon of missing edge information can be effectively avoided.
  • Step S1033 Extract the left-eye image from the first area, and extract the right-eye image from the second area.
  • the eye image is extracted according to the position information of the key points of each face obtained by the key point detection, and high-precision eye images (including left-eye images and right-eye images) can be obtained.
  • This eye image can greatly improve the accuracy of the detection results of the area of interest.
  • Step S104 Perform eye key point detection in the left eye image and the right eye image respectively to obtain position information of each eye key point in the left eye image and the right eye image.
  • the key points of the eye include but are not limited to: the center point of the iris, the upper edge of the iris, the lower edge of the iris, the left edge of the iris, the right edge of the iris, the edge of the upper eyelid, the edge of the lower eyelid, the left eye corner and the right eye corner .
  • a pre-trained stacked hourglass model (Stacked Hourglass Model, SHM) can be used to detect key eye points in the left-eye image and the right-eye image to obtain Position information of each key point of the eye in the left-eye image and the right-eye image.
  • SHM Stacked Hourglass Model
  • the stacked hourglass model can perform multi-scale transformation of the image to ensure a large receptive field, and also has good generalization performance for blurred images. This enables the embodiments of the present application to also perform under the condition of using a common camera. A higher accuracy rate can be obtained.
  • Step S105 Determine the line of sight direction according to the head posture, the left-eye image, and the right-eye image.
  • a pre-trained gaze model may be used to process the head posture, the left-eye image, and the right-eye image, so as to obtain the line of sight direction.
  • Step S106 Determine the eye area of interest according to the position information of each key point of the eye in the head posture, the line of sight direction, the left-eye image and the right-eye image.
  • step S106 may specifically include the process shown in FIG. 6:
  • Step S1061 construct the position information of each key point of the eye in the head posture, the line of sight direction, the left-eye image and the right-eye image as an input information vector.
  • the position information of each eye key point in the head posture, the line of sight direction, the left-eye image and the right-eye image are sequentially spliced into a row vector, that is, the input information vector.
  • Step S1062 Use the pre-trained fully connected neural network to process the input information vector to obtain output coordinate points.
  • the specific number of layers of the fully connected neural network and the number of nodes contained in each layer can be set in advance according to the actual situation, and each node is connected to all nodes of the previous layer to extract the previous layer
  • the features that have been arrived at are combined.
  • the embodiment of the present application preferably adopts a two-layer fully connected neural network to perform regression analysis on the input information vector, so as to obtain the output coordinate point.
  • the specific training process of the fully connected neural network will be described in detail later, and it will not be repeated here.
  • Step S1063 Determine the screen area corresponding to the output coordinate point as the eye focus area.
  • the screen of the terminal device may be divided into SN (SN is an integer greater than 1) screen areas in advance, which are recorded in the order from top to bottom and from left to right as: screen area 1 , Screen area 2,..., screen area s,..., screen area SN, where 1 ⁇ s ⁇ SN, if the output coordinate point falls within the coordinate range of screen area 1, then screen area 1 can be determined as the For the eye focus area, if the output coordinate point falls within the coordinate range of the screen area 2, then the screen area 2 can be determined as the eye focus area,..., if the output coordinate point falls within the screen area s If the output coordinate point falls within the coordinate range of the screen area SN, the screen area s can be determined as the eye focus area,..., the screen area SN can be determined as the eye Focus on the area.
  • SN is an integer greater than 1 screen areas in advance, which are recorded in the order from top to bottom and from left to right as: screen area 1 , Screen area 2,..., screen area s,..., screen area SN, where 1 ⁇ s ⁇
  • step S1062 the result may be smoothed through the process shown in FIG. 7:
  • Step S701 Obtain reference coordinate points.
  • the reference coordinate point is the output coordinate point corresponding to the K frames of face images collected before the target face image
  • K is a positive integer
  • its specific value can be set according to the actual situation, preferably, it can be set to 4.
  • Step S702 Construct the output coordinate point corresponding to the target face image and the reference coordinate point as a first coordinate point set.
  • Step S703 Eliminate outliers from the first coordinate point set to obtain a second coordinate point set.
  • the mean coordinate point of the first coordinate point set may be calculated first, and then the distance between each coordinate point in the first coordinate point set and the mean coordinate point may be calculated separately. If the distance between the mean coordinate points is greater than the preset distance threshold, the coordinate point can be determined as an outlier, and the coordinate point can be excluded from the first coordinate point set.
  • the specific value of the distance threshold may be set according to actual conditions, and the embodiment of the present application does not specifically limit it.
  • Step S704 Calculate the mean coordinate points of the second coordinate point set, and determine the mean coordinate points of the second coordinate set as the smoothed output coordinate points.
  • the screen area corresponding to the smoothed output coordinate points may be determined as the eye focus area.
  • the smoothing process shown in FIG. 7 can effectively reduce the impact of abnormal data. Based on the output coordinate points after this smoothing process, the accuracy of the detection result of the region of interest can be greatly improved.
  • the fully connected neural network may be trained in advance through the process shown in FIG. 8:
  • Step S801 Divide the preset screen into SN screen areas.
  • Step S802 Construct each calibration sample set respectively.
  • the s-th calibration sample set includes F S vector samples, each vector sample is the input information vector when the s-th screen area is paid attention to, and the label corresponding to each vector sample is the s-th screen area Center coordinate point, F S is a positive integer.
  • the specific construction process can include the steps shown in Figure 10:
  • Step S8021 Display a preset pattern at the center coordinate point of the s-th screen area of the screen.
  • a preset pattern can be displayed at the center of the s-th screen area of the screen to attract the subject's eyes to the s-th screen area.
  • the pattern can be set as a circular pattern, a square pattern, a triangular pattern, etc. according to actual conditions, and the embodiment of the present application does not specifically limit its form.
  • the pattern can be a color with a large contrast with the screen background. For example, if the screen background color is black, the pattern can be white, red, Green, purple, etc. are displayed.
  • Step S8022 collect sample images of each frame respectively.
  • the sample image is a face image when the subject pays attention to the pattern. After the pattern is displayed at the center of the s-th screen area of the screen, the subject’s gaze will be attracted to the pattern. At this time, the camera can be used to sequentially collect multiple frames of the subject’s attention.
  • Step S8023 Construct the input information vector of each frame sample image respectively.
  • step S102 The calculation process of the input information vector of each frame sample image is similar to the foregoing process. For details, please refer to the detailed process in step S102, step S103, step S104, step S105, and step S1061, which will not be repeated here.
  • Step S8024 Construct the input information vector of each frame of sample image as the s-th calibration sample set.
  • each calibration sample set can be constructed. That is, the pattern is displayed at the center of the first screen area (that is, the upper left screen area), and the first calibration sample set is constructed according to the collected sample images of each frame.
  • the second screen area that is, the screen area on the left
  • the second calibration sample set for pattern display, and construct a second calibration sample set based on the collected sample images of each frame, and display the pattern at the center of the third screen area (ie, the lower left screen area), and according to the collected
  • Each frame of sample images constructs the third calibration sample set,..., and so on.
  • Step S803 Use each calibration sample set to train the fully connected neural network in the initial state to obtain the pre-trained fully connected neural network.
  • the training process of the neural network is a commonly used technology in the prior art, and for details, reference may be made to any neural network training method in the prior art, which will not be repeated in this embodiment of the application.
  • the embodiment of the application obtains the target face image to be detected; calculates the head posture of the target face image; extracts the left eye image and the right eye image in the target face image; Performing eye key point detection in the left eye image and the right eye image to obtain position information of each eye key point in the left eye image and the right eye image; according to the head posture, the The left-eye image and the right-eye image determine the direction of the line of sight; the eye focus is determined according to the position information of each key point of the eye in the head posture, the direction of the line of sight, the left-eye image and the right-eye image area.
  • the location information can be used to determine the eye area of interest, which greatly reduces the cost and can be used in a wider range of applications.
  • FIG. 11 shows a structural diagram of an embodiment of a device for detecting a region of interest provided in an embodiment of the present application.
  • a device for detecting a region of interest may include:
  • the image acquisition module 1101 is used to acquire the target face image to be detected
  • the head posture calculation module 1102 is used to calculate the head posture of the target face image
  • the eye image extraction module 1103 is used to extract the left eye image and the right eye image in the target face image
  • the eye key point detection module 1104 is configured to perform eye key point detection in the left eye image and the right eye image to obtain each eye key point in the left eye image and the right eye image Location information;
  • the line-of-sight direction determining module 1105 is configured to determine the line-of-sight direction according to the head posture, the left-eye image, and the right-eye image;
  • the eye area of interest determination module 1106 is configured to determine the eye area of interest based on the position information of each key point of the eye in the head posture, the line of sight direction, the left-eye image and the right-eye image.
  • the eye attention area determination module may include:
  • the input information vector construction sub-module is used to construct the position information of each eye key point in the head posture, the line of sight direction, the left-eye image and the right-eye image as an input information vector;
  • the neural network processing sub-module is used to process the input information vector using a pre-trained fully connected neural network to obtain output coordinate points;
  • the eye area of interest determination sub-module is configured to determine the screen area corresponding to the output coordinate point as the eye area of interest.
  • the device for detecting a region of interest may further include:
  • the screen area division module is used to divide the preset screen into SN screen areas, where SN is an integer greater than 1;
  • the calibration sample set construction module is used to construct each calibration sample set separately.
  • the s-th calibration sample set includes F S vector samples, and each vector sample is the input information vector when the s-th screen area is paid attention to, and The label corresponding to each vector sample is the center coordinate point of the s-th screen area, 1 ⁇ s ⁇ SN, and F S is a positive integer;
  • the network training module is used to train the fully connected neural network in the initial state by using each calibration sample set to obtain the pre-trained fully connected neural network.
  • the calibration sample set construction module may include:
  • a pattern display sub-module for displaying a preset pattern at the center coordinate point of the s-th screen area of the screen
  • the sample image collection sub-module is used to separately collect sample images of each frame, where the sample images are the face images when the subject pays attention to the pattern;
  • the input information vector construction sub-module is used to construct the input information vector of each frame sample image
  • the calibration sample set construction sub-module is used to construct the input information vector of each frame of sample image into the s-th calibration sample set.
  • the eye attention area determination module may further include:
  • the reference coordinate point obtaining sub-module is configured to obtain a reference coordinate point, where the reference coordinate point is an output coordinate point corresponding to K frames of face images collected before the target face image, and K is a positive integer;
  • the first coordinate point set construction sub-module is configured to construct the output coordinate point corresponding to the target face image and the reference coordinate point as a first coordinate point set;
  • the outlier elimination sub-module is used to eliminate outliers from the first coordinate point set to obtain a second coordinate point set;
  • the smoothing sub-module is used to calculate the mean coordinate point of the second coordinate point set, and determine the mean coordinate point of the second coordinate set as the output coordinate point after smoothing.
  • the head posture calculation module may include:
  • a rotation matrix calculation sub-module configured to calculate the rotation matrix of the target face image according to the affine matrix
  • the head posture calculation sub-module is used to calculate the head posture according to the rotation matrix.
  • the eye image extraction module may include:
  • the face key point detection sub-module is used to perform face key point detection in the target face image to obtain position information of each face key point in the target face image;
  • the eye area determination sub-module is used to determine the first area of the left eye image in the target face image according to the position information of each key point of the face, and the right eye image in the target face image The second area in
  • the eye image extraction sub-module is configured to extract the left eye image from the first area and extract the right eye image from the second area.
  • FIG. 12 shows a schematic block diagram of a terminal device provided by an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.
  • the terminal device 12 of this embodiment includes: a processor 120, a memory 121, and a computer program 122 stored in the memory 121 and running on the processor 120.
  • the processor 120 executes the computer program 122
  • the steps in the above embodiments of the region of interest detection method are implemented, for example, steps S101 to S106 shown in FIG. 1.
  • the processor 120 executes the computer program 122
  • the functions of the modules/units in the foregoing device embodiments are implemented, for example, the functions of the modules 1101 to 1106 shown in FIG. 11.
  • the computer program 122 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 121 and executed by the processor 120 to complete This application.
  • the one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 122 in the terminal device 12.
  • the terminal device 12 may be a computing device such as a desktop computer, a notebook, a palmtop computer, a smart phone, and a smart TV.
  • FIG. 12 is only an example of the terminal device 12, and does not constitute a limitation on the terminal device 12. It may include more or less components than shown in the figure, or a combination of certain components, or different components.
  • the terminal device 12 may also include input and output devices, network access devices, buses, and the like.
  • the processor 120 may be a central processing unit (Central Processing Unit, CPU), other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the processor 120 may be the nerve center and command center of the terminal device 12, and the processor 120 may generate operation control signals according to instruction operation codes and timing signals, and complete the control of fetching instructions and executing instructions.
  • the memory 121 may be an internal storage unit of the terminal device 12, such as a hard disk or a memory of the terminal device 12.
  • the memory 121 may also be an external storage device of the terminal device 12, for example, a plug-in hard disk equipped on the terminal device 12, a smart memory card (Smart Media Card, SMC), and a Secure Digital (SD). Card, Flash Card, etc. Further, the memory 121 may also include both an internal storage unit of the terminal device 12 and an external storage device.
  • the memory 121 is used to store the computer program and other programs and data required by the terminal device 12.
  • the memory 121 may also be used to temporarily store data that has been output or will be output.
  • the terminal device 12 may also include a communication module, and the communication module may provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as Wi-Fi networks), Bluetooth, Zigbee, and mobile communication networks that are used on network devices Global Navigation Satellite System (Global Navigation Satellite System, GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared Technology (Infrared, IR) and other communication solutions.
  • the communication module may be one or more devices integrating at least one communication processing module.
  • the communication module may include an antenna, and the antenna may have only one array element or an antenna array including multiple array elements.
  • the communication module can receive electromagnetic waves through an antenna, frequency-modulate and filter the electromagnetic wave signals, and send the processed signals to the processor.
  • the communication module can also receive the signal to be sent from the processor, perform frequency modulation and amplification, and convert it into electromagnetic waves to radiate through the antenna.
  • the terminal device 12 may also include a power management module, which can receive input from an external power source, battery, and/or charger, and supply power to the processor, the memory, the communication module, and the like.
  • a power management module which can receive input from an external power source, battery, and/or charger, and supply power to the processor, the memory, the communication module, and the like.
  • the terminal device 12 may also include a display module, which may be used to display information input by the user or information provided to the user.
  • the display module may include a display panel.
  • the display panel may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an organic light-emitting diode (Organic Light-Emitting Diode, OLED), etc.
  • the touch panel may cover the display panel. When the touch panel detects a touch operation on or near it, it is transmitted to the processor to determine the type of the touch event, and then the processor is based on the type of the touch event. Provide corresponding visual output on the display panel.
  • the disclosed device/terminal device and method may be implemented in other ways.
  • the device/terminal device embodiments described above are merely illustrative.
  • the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units.
  • components can be combined or integrated into another system, or some features can be omitted or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the embodiments of the present application provide a computer program product.
  • the terminal device can implement the steps in the foregoing method embodiments.
  • the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the present application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) , Random Access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunications signal, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signal telecommunications signal
  • software distribution media etc.
  • the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction.
  • the computer-readable medium Does not include electrical carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Ophthalmology & Optometry (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

一种关注区域检测方法、装置、计算机可读存储介质及终端设备。所述方法获取待检测的目标人脸图像(S101);计算所述目标人脸图像的头部姿势(S102);提取所述目标人脸图像中的左眼图像和右眼图像(S103);分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息(S104);根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向(S105);根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域(S106)。无需使用价格昂贵的精密仪器,而是通过对人脸图像的图像分析处理来确定眼部关注区域,极大降低了成本,可以进行更广泛的应用。

Description

一种关注区域检测方法、装置、可读存储介质及终端设备 技术领域
本申请属于图像处理技术领域,尤其涉及一种关注区域检测方法、装置、计算机可读存储介质及终端设备。
背景技术
随着图像识别技术的发展,利用人眼视线进行人机互动的方式开始成为研究者们积极探索的问题。在商业模式下,可以根据顾客关注方向判断其对商品的感兴趣程度,进而展开合理广告推荐。这不仅可以给顾客带来新颖的购物体验,还可以给商家带来较好的收益。眼球可见部分中虹膜的相对位置会随着关注方向的改变而移动,这使通过眼部关键点进行关注方向的预测成为一种可能。当关注的可变范围较小时,虹膜位置的偏移量较小,难以有效定量分析,更难以准确估计关注区域。现有技术中的眼动仪设备使用红外相机和精密的传感器可以跟踪眼部运动,但其价格十分昂贵,难以进行广泛应用。
发明内容
有鉴于此,本申请实施例提供了一种关注区域检测方法、装置、计算机可读存储介质及终端设备,以解决现有的关注区域检测方法价格十分昂贵,难以进行广泛应用的问题。
本申请实施例的第一方面提供了一种关注区域检测方法,可以包括:
获取待检测的目标人脸图像;
计算所述目标人脸图像的头部姿势;
提取所述目标人脸图像中的左眼图像和右眼图像;
分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息;
根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向;
根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。
进一步地,所述根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域包括:
将所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息构造为输入信息向量;
使用预训练好的全连接神经网络对所述输入信息向量进行处理,得到输出坐标点;
将所述输出坐标点对应的屏幕区域确定为所述眼部关注区域。
进一步地,在使用预训练好的全连接神经网络对所述输入信息向量进行处理之前,还包括:
将预设的屏幕划分为SN个屏幕区域,SN为大于1的整数;
分别构造各个标定样本集,其中,第s个标定样本集中包括F S个向量样本,每个向量样本均为第s个屏幕区域被关注时的输入信息向量,且每个向量样本对应的标签均为第s个屏幕区域的中心坐标点,1≤s≤SN,F S为正整数;
使用各个标定样本集对初始状态的全连接神经网络进行训练,得到所述预训练好的全连接神经网络。
进一步地,所述分别构造各个标定样本集包括:
在所述屏幕的第s个屏幕区域的中心坐标点显示预设的图案;
分别采集各帧样本图像,所述样本图像为受试者关注所述图案时的人脸图像;
分别构造各帧样本图像的输入信息向量;
将各帧样本图像的输入信息向量构造为第s个标定样本集。
进一步地,在使用预训练好的全连接神经网络对所述输入信息向量进行处理,得到输出坐标点之后,还包括:
获取参考坐标点,所述参考坐标点在所述目标人脸图像之前采集的K帧人脸图像对应的输出坐标点,K为正整数;
将所述目标人脸图像对应的输出坐标点和所述参考坐标点构造为第一坐标点集合;
从所述第一坐标点集合中剔除离群点,得到第二坐标点集合;
计算所述第二坐标点集合的均值坐标点,并将所述第二坐标集合的均值坐标点确定为平滑处理后的输出坐标点。
进一步地,所述计算所述人脸图像的头部姿势包括:
计算所述目标人脸图像的仿射矩阵;
根据所述仿射矩阵计算所述目标人脸图像的旋转矩阵;
根据所述旋转矩阵计算所述头部姿势。
进一步地,所述提取所述目标人脸图像中的左眼图像和右眼图像包括:
在所述目标人脸图像中进行人脸关键点检测,得到所述目标人脸图像中的各个人脸关键点的位置信息;
根据各个人脸关键点的位置信息确定所述左眼图像在所述目标人脸图像中的第一区域,以及所述右眼图像在所述目标人脸图像中的第二区域;
从所述第一区域中提取所述左眼图像,并从所述第二区域中提取所述右眼图像。
本申请实施例的第二方面提供了一种关注区域检测装置,可以包括:
图像采集模块,用于获取待检测的目标人脸图像;
头部姿势计算模块,用于计算所述目标人脸图像的头部姿势;
眼部图像提取模块,用于提取所述目标人脸图像中的左眼图像和右眼图像;
眼部关键点检测模块,用于分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息;
视线方向确定模块,用于根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向;
眼部关注区域确定模块,用于根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。
进一步地,所述眼部关注区域确定模块可以包括:
输入信息向量构造子模块,用于将所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息构造为输入信息向量;
神经网络处理子模块,用于使用预训练好的全连接神经网络对所述输入信息向量进行处理,得到输出坐标点;
眼部关注区域确定子模块,用于将所述输出坐标点对应的屏幕区域确定为所述眼部关注区域。
进一步地,所述关注区域检测装置还可以包括:
屏幕区域划分模块,用于将预设的屏幕划分为SN个屏幕区域,SN为大于1的整数;
标定样本集构造模块,用于分别构造各个标定样本集,其中,第s个标定样本集中包括F S个向量样本,每个向量样本均为第s个屏幕区域被关注时的输入信息向量,且每个向量样本对应的标签均为第s个屏幕区域的中心坐标点,1≤s≤SN,F S为正整数;
网络训练模块,用于使用各个标定样本集对初始状态的全连接神经网络进行训练,得到所述预训练好的全连接神经网络。
进一步地,所述标定样本集构造模块可以包括:
图案显示子模块,用于在所述屏幕的第s个屏幕区域的中心坐标点显示预设的图案;
样本图像采集子模块,用于分别采集各帧样本图像,所述样本图像为受试者关注所述图案时的人脸图像;
输入信息向量构造子模块,用于分别构造各帧样本图像的输入信息向量;
标定样本集构造子模块,用于将各帧样本图像的输入信息向量构造为第s个标定样本集。
进一步地,所述眼部关注区域确定模块还可以包括:
参考坐标点获取子模块,用于获取参考坐标点,所述参考坐标点在所述目标人脸图像之前采集的K帧人脸图像对应的输出坐标点,K为正整数;
第一坐标点集合构造子模块,用于将所述目标人脸图像对应的输出坐标点和所述参考坐标点构造为第一坐标点集合;
离群点剔除子模块,用于从所述第一坐标点集合中剔除离群点,得到第二坐标点集合;
平滑处理子模块,用于计算所述第二坐标点集合的均值坐标点,并将所述第二坐标集合的均值坐标点确定为平滑处理后的输出坐标点。
进一步地,所述头部姿势计算模块可以包括:
仿射矩阵计算子模块,用于计算所述目标人脸图像的仿射矩阵;
旋转矩阵计算子模块,用于根据所述仿射矩阵计算所述目标人脸图像的旋转矩阵;
头部姿势计算子模块,用于根据所述旋转矩阵计算所述头部姿势。
进一步地,所述眼部图像提取模块可以包括:
人脸关键点检测检测子模块,用于在所述目标人脸图像中进行人脸关键点检测,得到所述目标人脸图像中的各个人脸关键点的位置信息;
眼部区域确定子模块,用于根据各个人脸关键点的位置信息确定所述左眼图像在所述目标人脸图像中的第一区域,以及所述右眼图像在所述目标人脸图像中的第二区域;
眼部图像提取子模块,用于从所述第一区域中提取所述左眼图像,并从所述第二区域中提取所述右眼图像。
本申请实施例的第三方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种关注区域检测方法的步骤。
本申请实施例的第四方面提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任一种关注区域检测方法的步骤。
本申请实施例的第五方面提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述任一种关注区域检测方法的步骤。
本申请实施例与现有技术相比存在的有益效果是:本申请实施例获取待检测的目标人脸图像;计算所述目标人脸图像的头部姿势;提取所述目标人脸图像中的左眼图像和右眼图像;分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息;根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向;根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。在本申请实施例中,无需使用价格昂贵的精密仪器,而是通过对人脸图像的图像分析处理,分别得到头部姿势、视线方向、左眼图像和右眼图像中的各个眼部关键点的位置信息,并以此来确定眼部关注区域,极大降低了成本,可以进行更广泛的应用。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。
图1为本申请实施例中一种关注区域检测方法的一个实施例流程图;
图2为计算目标人脸图像的头部姿势的示意流程图;
图3为提取目标人脸图像中的左眼图像和右眼图像的示意流程图;
图4为人脸关键点的示意图;
图5为眼部关键点的示意图;
图6为根据头部姿势、视线方向、左眼图像和右眼图像中的各个眼部关键点的位置信息确定眼部关注区域的示意流程图;
图7为对输出坐标点进行平滑处理的示意流程图;
图8为对全连接神经网络进行训练的示意流程图;
图9将屏幕划分为若干个屏幕区域的示意图;
图10为构造标定样本集的示意流程图;
图11为本申请实施例中一种关注区域检测装置的一个实施例结构图;
图12为本申请实施例中一种终端设备的示意框图。
具体实施方式
为使得本申请的发明目的、特征、优点能够更加的明显和易懂,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本申请一部分实施例,而非全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
请参阅图1,本申请实施例中一种关注区域检测方法的一个实施例可以包括:
步骤S101、获取待检测的目标人脸图像。
本申请实施例的执行主体可以为带有摄像头和屏幕的终端设备,包括但不限于桌上型计算机、笔记本、掌上电脑、智能手机及智能电视。
在本申请实施例的一种具体实现中,当用户在观看所述终端设备的屏幕时,所述终端设备可以通过朝向用户方向的摄像头采集当前的人脸图像,也即所述目标人脸图像。
步骤S102、计算所述目标人脸图像的头部姿势。
在本申请实施例的一种具体实现中,步骤S102具体可以包括如图2所示的过程:
步骤S1021、计算所述目标人脸图像的仿射矩阵。
在本申请实施例中,优选采用预训练好的三维可变形模型(3D Morphable  Model,3DMM)来对所述目标人脸图像进行处理,3DMM是一种人脸三维重建模型,可以通过二维人脸信息获得对应的三维人脸信息,其关键在于确定二维人脸信息与三维人脸信息之间的映射关系,这一映射关系可以用矩阵的形式来进行表示,也即仿射矩阵。
在本申请实施例中,所述目标人脸图像记录的是二维人脸信息,此处将其记为x 2d,将其对应的三维人脸信息记为X 3d,将所述仿射矩阵记为Matrix,则有:x 2d=Matrix*X 3d
通过3DMM的处理,可以得到一个3×4的仿射矩阵,即:
Figure PCTCN2020109068-appb-000001
其中,Matrix的前三列为线性变换矩阵,ri j为所述线性变换矩阵的第i行第j列的元素,1≤i≤3,1≤j≤3,Matrix的最后一列为平移变换向量,t k为所述平移变换向量的第k个元素,1≤k≤3。
步骤S1022、根据所述仿射矩阵计算所述目标人脸图像的旋转矩阵。
具体地,首先可以根据下式从所述仿射矩阵中分别抽取出第一向量R 1和第二向量R 2
R 1=[r 11 r 12 r 13],R 2=[r 21 r 22 r 23];
然后,可以根据下式对所述第一向量和所述第二向量分别进行归一化处理:
Figure PCTCN2020109068-appb-000002
其中,||R 1|| 2为所述第一向量的模,||R 2|| 2为所述第二向量的模,
Figure PCTCN2020109068-appb-000003
为归一化后的第一向量,
Figure PCTCN2020109068-appb-000004
为归一化后的第二向量。
最后,可以根据下式计算所述旋转矩阵:
Figure PCTCN2020109068-appb-000005
其中,M即为所述旋转矩阵。
步骤S1023、根据所述旋转矩阵计算所述头部姿势。
具体地,可以根据下式计算所述头部姿势:
Figure PCTCN2020109068-appb-000006
其中,x为所述头部姿势的俯仰角,y为所述头部姿势的偏转角,z为所述头部姿势的翻滚角。
通过图2所示的过程,可以得到高精度的头部姿势,以这一头部姿势为依据可以极大提高关注区域检测结果的准确率。
需要注意的是,图2所示的过程只是计算所述头部姿势的一种优选方式,在实际应用中,还可以根据具体情况选取其它的方式来进行所述头部姿势的计算,本申请实施例对此不作具体限定。
步骤S103、提取所述目标人脸图像中的左眼图像和右眼图像。
在本申请实施例的一种具体实现中,步骤S103具体可以包括如图3所示的过程:
步骤S1031、在所述目标人脸图像中进行人脸关键点检测,得到所述目标人脸图像中的各个人脸关键点的位置信息。
所述人脸关键点包括但不限于:眉毛、眼睛、鼻子、嘴巴、脸部轮廓等部位的关键点。图4所示即为一种人脸关键点的示意图,其中,标号0至标号16为脸部轮廓的关键点,标号17至标号21为左眉的关键点,标号22至标号26为右眉的关键点,标号27至标号35为鼻子的关键点,标号36至标号41为左眼的关键点,标号42至标号47为左眼的关键点,标号48至标号67为嘴巴的关键点。
在本申请实施例的一种具体实现中,可以使用3DMM在所述目标人脸图像中进行人脸关键点检测,从而得到所述目标人脸图像中的各个人脸关键点的位置信息。
当然,在本申请实施例的其它具体实现中,可以使用其它现有技术中常用的检测模型在所述眼部图像中进行眼部关键点检测,本申请实施例对其不作具体限定。
步骤S1032、根据各个人脸关键点的位置信息确定所述左眼图像在所述目标人脸图像中的第一区域,以及所述右眼图像在所述目标人脸图像中的第二区域。
如图4所示,标号36至标号41为左眼的关键点,标号42至标号47为左眼的关键点,此处将左眼的横坐标最小值记为left_x_min,将左眼的横坐标最大值记为left_x_max,将左眼的纵坐标最小值记为left_y_min,将左眼的纵坐标最大值记为left_y_max,将右眼的横坐标最小值记为right_x_min,将右眼的横 坐标最大值记为right_x_max,将右眼的纵坐标最小值记为right_y_min,将右眼的纵坐标最大值记为right_y_max。
在本申请实施例的一种具体实现中,可以将以下四个坐标点所构成的矩形区域(记为LA1)作为所述第一区域:(left_x_min,left_y_max),(left_x_min,left_y_min),(left_x_max,left_y_max),(left_x_max,left_y_min),将以下四个坐标点所构成的矩形区域(记为RA1)作为所述第二区域:(right_x_min,right_y_max),(right_x_min,right_y_min),(right_x_max,right_y_max),(right_x_max,right_y_min)。
考虑到直接利用最值信息截取左右眼图像可能会存在边缘信息缺失的现象,因此,在本申请实施例的另一种具体实现中,还可以对LA1进行外扩,得到新的矩形区域LA2,并将LA2作为所述第一区域;对RA1进行外扩,得到新的矩形区域RA2,并将RA2作为所述第二区域。
此处将LA2的四个顶点的坐标分别记为:(left_x_min_new,left_y_max_new),(left_x_min_new,left_y_min_new),(left_x_max_new,left_y_max_new),(left_x_max_new,left_y_min_new),则有:
left_x_min_new=left_x_min-p×(left_x_max-left_x_min);
left_x_max_new=left_x_max+p×(left_x_max-left_x_min);
left_y_min_new=left_y_min-p×(left_y_max-left_y_min);
left_y_max_new=left_y_max+p×(left_y_max-left_y_min);
将RA2的四个顶点的坐标分别记为:(right_x_min_new,right_y_max_new),(right_x_min_new,right_y_min_new),(right_x_max_new,right_y_max_new),(right_x_max_new,right_y_min_new),则有:
right_x_min_new=right_x_min-p×(right_x_max-right_x_min);
right_x_max_new=right_x_max+p×(right_x_max-right_x_min);
right_y_min_new=right_y_min-p×(right_y_max-right_y_min);
right_y_max_new=right_y_max+p×(right_y_max-right_y_min);
其中,p为预设的外扩系数,其具体取值可以根据实际情况进行设置,此处优选将其设置为1.4,即最终确定的眼部图像的宽、高分别是原始的眼部图像宽、高的1.4倍,通过这样的方式,可以有效避免边缘信息缺失的现象。
步骤S1033、从所述第一区域中提取所述左眼图像,并从所述第二区域中提取所述右眼图像。
通过图3所示的过程,根据关键点检测得到的各个人脸关键点的位置信息来进行眼部图像的提取,可以得到高精度的眼部图像(包括左眼图像和右眼图像),以这一眼部图像为依据可以极大提高关注区域检测结果的准确率。
步骤S104、分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息。
如图5所示,所述眼部关键点包括但不限于:虹膜中心点、虹膜上边缘、虹膜下边缘、虹膜左边缘、虹膜右边缘、上眼睑边缘、下眼睑边缘、左眼角和右眼角。
在本申请实施例的一种具体实现中,可以使用预训练好的堆叠沙漏模型(Stacked Hourglass Model,SHM)在所述左眼图像和所述右眼图像中进行眼部关键点检测,从而得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息。所述堆叠沙漏模型能对图像进行多尺度变换,确保得到大的感受野(receptive field),对模糊图像也有较好的泛化性能,这就使本申请实施例在使用普通摄像头的条件下也可以获得较高的准确率。
当然,在本申请实施例的其它具体实现中,可以使用其它现有技术中常用的检测模型在所述左眼图像和所述右眼图像中进行眼部关键点检测,本申请实施例对其不作具体限定。
步骤S105、根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向。
在本申请实施例的一种具体实现中,可以使用预训练好的注视模型(Gaze Model)对所述头部姿势、所述左眼图像和所述右眼图像进行处理,从而得到所述视线方向。
当然,在本申请实施例的其它具体实现中,可以使用其它现有技术中常用的视线确定模型来确定视线方向,本申请实施例对其不作具体限定。
步骤S106、根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。
在本申请实施例的一种具体实现中,步骤S106具体可以包括如图6所示的过程:
步骤S1061、将所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息构造为输入信息向量。
即将所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各 个眼部关键点的位置信息依次拼接成一个行向量,也即所述输入信息向量。
步骤S1062、使用预训练好的全连接神经网络对所述输入信息向量进行处理,得到输出坐标点。
所述全连接神经网络的具体层数和每层包含的结点数均可根据实际情况预先进行设置,其中的每一个结点都与上一层的所有结点相连,用来把上一层提取到的特征综合起来。本申请实施例优选采用两层的全连接神经网络对所述输入信息向量进行回归分析,从而得到所述输出坐标点。所述全连接神经网络的具体训练过程将在后文中具体描述,此处对其不再赘述。
步骤S1063、将所述输出坐标点对应的屏幕区域确定为所述眼部关注区域。
在本申请实施例中,可以预先将所述终端设备的屏幕划分为SN个(SN为大于1的整数)屏幕区域,按照从上到下,从左到右的顺序依次记为:屏幕区域1、屏幕区域2、…、屏幕区域s、…、屏幕区域SN,其中,1≤s≤SN,若所述输出坐标点落入屏幕区域1的坐标范围内,则可将屏幕区域1确定为所述眼部关注区域,若所述输出坐标点落入屏幕区域2的坐标范围内,则可将屏幕区域2确定为所述眼部关注区域,…,若所述输出坐标点落入屏幕区域s的坐标范围内,则可将屏幕区域s确定为所述眼部关注区域,…,若所述输出坐标点落入屏幕区域SN的坐标范围内,则可将屏幕区域SN确定为所述眼部关注区域。
通过预先设置各个屏幕区域的坐标范围,在计算得到输出点坐标之后,仅需要判断该输出点坐标属于哪一个坐标范围,即可确定出对应的眼部关注区域,计算量极小,大大提升了关注区域检测的效率。
在本申请实施例的另一种具体实现中,在步骤S1062之后,还可以通过图7所示的过程对结果进行平滑处理:
步骤S701、获取参考坐标点。
所述参考坐标点在所述目标人脸图像之前采集的K帧人脸图像对应的输出坐标点,K为正整数,其具体取值可以根据实际情况进行设置,优选地,可以将其设置为4。
步骤S702、将所述目标人脸图像对应的输出坐标点和所述参考坐标点构造为第一坐标点集合。
步骤S703、从所述第一坐标点集合中剔除离群点,得到第二坐标点集合。
具体地,可以首先计算所述第一坐标点集合的均值坐标点,然后分别计算 所述第一坐标点集合中的各个坐标点与该均值坐标点之间的距离,若某一坐标点与该均值坐标点之间的距离大于预设的距离阈值,则可将该坐标点确定为离群点,并将该坐标点从所述第一坐标点集合中剔除。所述距离阈值的具体取值可以根据实际情况进行设置,本申请实施例对其不作具体限定。
步骤S704、计算所述第二坐标点集合的均值坐标点,并将所述第二坐标集合的均值坐标点确定为平滑处理后的输出坐标点。
在这种情况下,则可将所述平滑处理后的输出坐标点对应的屏幕区域确定为所述眼部关注区域。
通过图7所示的平滑处理过程,可以有效降低异常数据带来的影响,以这一平滑处理后的输出坐标点为依据可以极大提高关注区域检测结果的准确率。
优选地,在步骤S106之前,可以预先通过图8所示的过程对所述全连接神经网络进行训练:
步骤S801、将预设的屏幕划分为SN个屏幕区域。
在本申请实施例中,可以将所述终端设备的屏幕在水平方向上划分为SN个屏幕区域,其中,SN=Line×Colume,Line为屏幕划分的行数,Colume为屏幕划分的列数,将这些屏幕区域按照从上到下,从左到右的顺序依次记为:屏幕区域1、屏幕区域2、…、屏幕区域s、…、屏幕区域SN。
图9所示即为Line=3,Colume=3,也即SN=9时的屏幕区域划分情况。
步骤S802、分别构造各个标定样本集。
其中,第s个标定样本集中包括F S个向量样本,每个向量样本均为第s个屏幕区域被关注时的输入信息向量,且每个向量样本对应的标签均为第s个屏幕区域的中心坐标点,F S为正整数。
以其中的第s个标定样本集为例,其具体的构造过程可以包括如图10所示的步骤:
步骤S8021、在所述屏幕的第s个屏幕区域的中心坐标点显示预设的图案。
在构造第s个标定样本集时,可以通过在所述屏幕的第s个屏幕区域的中心位置显示预设的图案,将受试者的目光吸引到第s个屏幕区域。所述图案可以根据实际情况设置为圆形图案、方形图案、三角图案等等,本申请实施例对其形式不作具体限定。需要注意的是,为了达到更好的吸引注意力的效果,所述图案可以采用与屏幕背景色反差较大的颜色,例如,若屏幕背景色为黑色,则所述图案可以以白色、红色、绿色、紫色等进行显示。
步骤S8022、分别采集各帧样本图像。
所述样本图像为受试者关注所述图案时的人脸图像。在所述屏幕的第s个屏幕区域的中心位置显示所述图案之后,受试者的目光会被吸引到所述图案上,此时则可以使用摄像头依次采集多帧所述受试者关注所述图案时的眼部图像,也即所述样本图像。
步骤S8023、分别构造各帧样本图像的输入信息向量。
各帧样本图像的输入信息向量的计算过程与前述过程类似,具体可参照步骤S102、步骤S103、步骤S104、步骤S105及步骤S1061中的详细过程,此处不再赘述。
步骤S8024、将各帧样本图像的输入信息向量构造为第s个标定样本集。
根据图10所示的过程遍历s从1到SN的取值,则可以构造出各个标定样本集。即在第1个屏幕区域(即左上的屏幕区域)的中心位置进行图案显示,并根据采集的各帧样本图像构造出第1个标定样本集,在第2个屏幕区域(即左中的屏幕区域)的中心位置进行图案显示,并根据采集的各帧样本图像构造出第2个标定样本集,在第3个屏幕区域(即左下的屏幕区域)的中心位置进行图案显示,并根据采集的各帧样本图像构造出第3个标定样本集,……,以此类推。
通过图10所示的过程,对于每一个屏幕区域,均以实测的方式采集到受试者关注该屏幕区域时的各帧样本图像,分别计算得到各帧样本图像的输入信息向量,并构造出对应的标定样本集,从而为后续进行关注区域检测提供了大量的数据依据,使得最终的检测结果具有更高的准确率。
步骤S803、使用各个标定样本集对初始状态的全连接神经网络进行训练,得到所述预训练好的全连接神经网络。
神经网络的训练过程为现有技术中较为常用的技术,具体可参照现有技术中的任意一种神经网络训练方式,本申请实施例对此不再赘述。
通过图8所示的过程,预先采集了大量的实际测试得到的样本,构造出各个屏幕区域分别对应的标定样本集,以这些实测数据作为依据来对全连接神经网络进行训练,从而使得最终得到的全连接神经网络更加符合实际情况,以此为基础检测得到的眼部关注区域具有更高的准确率。
综上所述,本申请实施例获取待检测的目标人脸图像;计算所述目标人脸图像的头部姿势;提取所述目标人脸图像中的左眼图像和右眼图像;分别在所 述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息;根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向;根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。在本申请实施例中,无需使用价格昂贵的精密仪器,而是通过对人脸图像的图像分析处理,分别得到头部姿势、视线方向、左眼图像和右眼图像中的各个眼部关键点的位置信息,并以此来确定眼部关注区域,极大降低了成本,可以进行更广泛的应用。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于上文实施例所述的一种关注区域检测方法,图11示出了本申请实施例提供的一种关注区域检测装置的一个实施例结构图。
本实施例中,一种关注区域检测装置可以包括:
图像采集模块1101,用于获取待检测的目标人脸图像;
头部姿势计算模块1102,用于计算所述目标人脸图像的头部姿势;
眼部图像提取模块1103,用于提取所述目标人脸图像中的左眼图像和右眼图像;
眼部关键点检测模块1104,用于分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息;
视线方向确定模块1105,用于根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向;
眼部关注区域确定模块1106,用于根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。
进一步地,所述眼部关注区域确定模块可以包括:
输入信息向量构造子模块,用于将所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息构造为输入信息向量;
神经网络处理子模块,用于使用预训练好的全连接神经网络对所述输入信息向量进行处理,得到输出坐标点;
眼部关注区域确定子模块,用于将所述输出坐标点对应的屏幕区域确定为所述眼部关注区域。
进一步地,所述关注区域检测装置还可以包括:
屏幕区域划分模块,用于将预设的屏幕划分为SN个屏幕区域,SN为大于1的整数;
标定样本集构造模块,用于分别构造各个标定样本集,其中,第s个标定样本集中包括F S个向量样本,每个向量样本均为第s个屏幕区域被关注时的输入信息向量,且每个向量样本对应的标签均为第s个屏幕区域的中心坐标点,1≤s≤SN,F S为正整数;
网络训练模块,用于使用各个标定样本集对初始状态的全连接神经网络进行训练,得到所述预训练好的全连接神经网络。
进一步地,所述标定样本集构造模块可以包括:
图案显示子模块,用于在所述屏幕的第s个屏幕区域的中心坐标点显示预设的图案;
样本图像采集子模块,用于分别采集各帧样本图像,所述样本图像为受试者关注所述图案时的人脸图像;
输入信息向量构造子模块,用于分别构造各帧样本图像的输入信息向量;
标定样本集构造子模块,用于将各帧样本图像的输入信息向量构造为第s个标定样本集。
进一步地,所述眼部关注区域确定模块还可以包括:
参考坐标点获取子模块,用于获取参考坐标点,所述参考坐标点在所述目标人脸图像之前采集的K帧人脸图像对应的输出坐标点,K为正整数;
第一坐标点集合构造子模块,用于将所述目标人脸图像对应的输出坐标点和所述参考坐标点构造为第一坐标点集合;
离群点剔除子模块,用于从所述第一坐标点集合中剔除离群点,得到第二坐标点集合;
平滑处理子模块,用于计算所述第二坐标点集合的均值坐标点,并将所述第二坐标集合的均值坐标点确定为平滑处理后的输出坐标点。
进一步地,所述头部姿势计算模块可以包括:
仿射矩阵计算子模块,用于计算所述目标人脸图像的仿射矩阵;
旋转矩阵计算子模块,用于根据所述仿射矩阵计算所述目标人脸图像的旋 转矩阵;
头部姿势计算子模块,用于根据所述旋转矩阵计算所述头部姿势。
进一步地,所述眼部图像提取模块可以包括:
人脸关键点检测检测子模块,用于在所述目标人脸图像中进行人脸关键点检测,得到所述目标人脸图像中的各个人脸关键点的位置信息;
眼部区域确定子模块,用于根据各个人脸关键点的位置信息确定所述左眼图像在所述目标人脸图像中的第一区域,以及所述右眼图像在所述目标人脸图像中的第二区域;
眼部图像提取子模块,用于从所述第一区域中提取所述左眼图像,并从所述第二区域中提取所述右眼图像。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置,模块和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
图12示出了本申请实施例提供的一种终端设备的示意框图,为了便于说明,仅示出了与本申请实施例相关的部分。
如图12所示,该实施例的终端设备12包括:处理器120、存储器121以及存储在所述存储器121中并可在所述处理器120上运行的计算机程序122。所述处理器120执行所述计算机程序122时实现上述各个关注区域检测方法实施例中的步骤,例如图1所示的步骤S101至步骤S106。或者,所述处理器120执行所述计算机程序122时实现上述各装置实施例中各模块/单元的功能,例如图11所示模块1101至模块1106的功能。
示例性的,所述计算机程序122可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器121中,并由所述处理器120执行,以完成本申请。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序122在所述终端设备12中的执行过程。
所述终端设备12可以是桌上型计算机、笔记本、掌上电脑、智能手机及智能电视等计算设备。本领域技术人员可以理解,图12仅仅是终端设备12的示 例,并不构成对终端设备12的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述终端设备12还可以包括输入输出设备、网络接入设备、总线等。
所述处理器120可以是中央处理单元(Central Processing Unit,CPU),还可以是其它通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其它可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。所述处理器120可以是所述终端设备12的神经中枢和指挥中心,所述处理器120可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
所述存储器121可以是所述终端设备12的内部存储单元,例如终端设备12的硬盘或内存。所述存储器121也可以是所述终端设备12的外部存储设备,例如所述终端设备12上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器121还可以既包括所述终端设备12的内部存储单元也包括外部存储设备。所述存储器121用于存储所述计算机程序以及所述终端设备12所需的其它程序和数据。所述存储器121还可以用于暂时地存储已经输出或者将要输出的数据。
所述终端设备12还可以包括通信模块,所述通信模块可以提供应用在网络设备上的包括无线局域网(Wireless Local Area Networks,WLAN)(如Wi-Fi网络),蓝牙,Zigbee,移动通信网络,全球导航卫星系统(Global Navigation Satellite System,GNSS),调频(Frequency Modulation,FM),近距离无线通信技术(Near Field Communication,NFC),红外技术(Infrared,IR)等通信的解决方案。所述通信模块可以是集成至少一个通信处理模块的一个或多个器件。该通信模块可以包括天线,该天线可以只有一个阵元,也可以是包括多个阵元的天线阵列。所述通信模块可以通过天线接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器。所述通信模块还可以从处理器接收待发送的信号,对其进行调频、放大,经天线转为电磁波辐射出去。
所述终端设备12还可以包括电源管理模块,所述电源管理模块可以接收外接电源、电池和/或充电器的输入,为所述处理器、所述存储器和所述通信模块 等供电。
所述终端设备12还可以包括显示模块,所述显示模块可用于显示由用户输入的信息或提供给用户的信息。所述显示模块可包括显示面板,可选的,可以采用液晶显示器(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)等形式来配置显示面板。进一步的,触控面板可覆盖显示面板,当触控面板检测到在其上或附近的触摸操作后,传送给所述处理器以确定触摸事件的类型,随后所述处理器根据触摸事件的类型在所述显示面板上提供相应的视觉输出。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置/终端设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/终端设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或 通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
本申请实施例提供了一种计算机程序产品,当计算机程序产品在所述终端设备上运行时,使得所述终端设备可实现上述各个方法实施例中的步骤。
所述集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (10)

  1. 一种关注区域检测方法,其特征在于,包括:
    获取待检测的目标人脸图像;
    计算所述目标人脸图像的头部姿势;
    提取所述目标人脸图像中的左眼图像和右眼图像;
    分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息;
    根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向;
    根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。
  2. 根据权利要求1所述的关注区域检测方法,其特征在于,所述根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域包括:
    将所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息构造为输入信息向量;
    使用预训练好的全连接神经网络对所述输入信息向量进行处理,得到输出坐标点;
    将所述输出坐标点对应的屏幕区域确定为所述眼部关注区域。
  3. 根据权利要求2所述的关注区域检测方法,其特征在于,在使用预训练好的全连接神经网络对所述输入信息向量进行处理之前,还包括:
    将预设的屏幕划分为SN个屏幕区域,SN为大于1的整数;
    分别构造各个标定样本集,其中,第s个标定样本集中包括F S个向量样本,每个向量样本均为第s个屏幕区域被关注时的输入信息向量,且每个向量样本对应的标签均为第s个屏幕区域的中心坐标点,1≤s≤SN,F S为正整数;
    使用各个标定样本集对初始状态的全连接神经网络进行训练,得到所述预训练好的全连接神经网络。
  4. 根据权利要求3所述的关注区域检测方法,其特征在于,所述分别构造各个标定样本集包括:
    在所述屏幕的第s个屏幕区域的中心坐标点显示预设的图案;
    分别采集各帧样本图像,所述样本图像为受试者关注所述图案时的人脸图像;
    分别构造各帧样本图像的输入信息向量;
    将各帧样本图像的输入信息向量构造为第s个标定样本集。
  5. 根据权利要求2所述的关注区域检测方法,其特征在于,在使用预训练好的全连接神经网络对所述输入信息向量进行处理,得到输出坐标点之后,还包括:
    获取参考坐标点,所述参考坐标点在所述目标人脸图像之前采集的K帧人脸图像对应的输出坐标点,K为正整数;
    将所述目标人脸图像对应的输出坐标点和所述参考坐标点构造为第一坐标点集合;
    从所述第一坐标点集合中剔除离群点,得到第二坐标点集合;
    计算所述第二坐标点集合的均值坐标点,并将所述第二坐标集合的均值坐标点确定为平滑处理后的输出坐标点。
  6. 根据权利要求1至5中任一项所述的关注区域检测方法,其特征在于,所述计算所述人脸图像的头部姿势包括:
    计算所述目标人脸图像的仿射矩阵;
    根据所述仿射矩阵计算所述目标人脸图像的旋转矩阵;
    根据所述旋转矩阵计算所述头部姿势。
  7. 根据权利要求1至5中任一项所述的关注区域检测方法,其特征在于,所述提取所述目标人脸图像中的左眼图像和右眼图像包括:
    在所述目标人脸图像中进行人脸关键点检测,得到所述目标人脸图像中的各个人脸关键点的位置信息;
    根据各个人脸关键点的位置信息确定所述左眼图像在所述目标人脸图像中的第一区域,以及所述右眼图像在所述目标人脸图像中的第二区域;
    从所述第一区域中提取所述左眼图像,并从所述第二区域中提取所述右眼图像。
  8. 一种关注区域检测装置,其特征在于,包括:
    图像采集模块,用于获取待检测的目标人脸图像;
    头部姿势计算模块,用于计算所述目标人脸图像的头部姿势;
    眼部图像提取模块,用于提取所述目标人脸图像中的左眼图像和右眼图像;
    眼部关键点检测模块,用于分别在所述左眼图像和所述右眼图像中进行眼部关键点检测,得到所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息;
    视线方向确定模块,用于根据所述头部姿势、所述左眼图像和所述右眼图像确定视线方向;
    眼部关注区域确定模块,用于根据所述头部姿势、所述视线方向、所述左眼图像和所述右眼图像中的各个眼部关键点的位置信息确定眼部关注区域。
  9. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述的关注区域检测方法的步骤。
  10. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的关注区域检测方法的步骤。
PCT/CN2020/109068 2019-10-29 2020-08-14 一种关注区域检测方法、装置、可读存储介质及终端设备 WO2021082635A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911028690.0 2019-10-29
CN201911028690.0A CN110909611B (zh) 2019-10-29 2019-10-29 一种关注区域检测方法、装置、可读存储介质及终端设备

Publications (1)

Publication Number Publication Date
WO2021082635A1 true WO2021082635A1 (zh) 2021-05-06

Family

ID=69815748

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/109068 WO2021082635A1 (zh) 2019-10-29 2020-08-14 一种关注区域检测方法、装置、可读存储介质及终端设备

Country Status (2)

Country Link
CN (1) CN110909611B (zh)
WO (1) WO2021082635A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553928A (zh) * 2021-07-13 2021-10-26 厦门瑞为信息技术有限公司 一种人脸活体检测方法、系统和计算机设备
CN114419738A (zh) * 2022-03-29 2022-04-29 北京市商汤科技开发有限公司 一种姿态检测方法、装置、电子设备以及存储介质
CN115981772A (zh) * 2023-01-10 2023-04-18 呼和浩特市凡诚电子科技有限公司 一种基于大数据的计算机性能控制系统及方法
CN116030512A (zh) * 2022-08-04 2023-04-28 荣耀终端有限公司 注视点检测方法及装置

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909611B (zh) * 2019-10-29 2021-03-05 深圳云天励飞技术有限公司 一种关注区域检测方法、装置、可读存储介质及终端设备
CN111582040B (zh) * 2020-04-13 2023-10-13 武汉理工大学 一种船舶驾驶舱的人员定位方法、系统和存储介质
CN113743172B (zh) * 2020-05-29 2024-04-16 魔门塔(苏州)科技有限公司 一种人员注视位置检测方法及装置
CN111796874A (zh) * 2020-06-28 2020-10-20 北京百度网讯科技有限公司 一种设备唤醒的方法、装置、计算机设备和存储介质
CN112272279B (zh) * 2020-10-23 2023-04-28 岭东核电有限公司 作业信息展示方法、装置、计算机设备和存储介质
CN112348945B (zh) * 2020-11-02 2024-01-02 上海联影医疗科技股份有限公司 定位图像生成方法、装置、设备及介质
CN112416126B (zh) * 2020-11-18 2023-07-28 青岛海尔科技有限公司 页面滚动控制方法和装置、存储介质及电子设备
CN112883918B (zh) * 2021-03-22 2024-03-19 深圳市百富智能新技术有限公司 人脸检测方法、装置、终端设备及计算机可读存储介质
CN113743254B (zh) * 2021-08-18 2024-04-09 北京格灵深瞳信息技术股份有限公司 视线估计方法、装置、电子设备及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361332A (zh) * 2014-12-08 2015-02-18 重庆市科学技术研究院 一种用于疲劳驾驶检测的人脸眼睛区域定位方法
US20180088671A1 (en) * 2016-09-27 2018-03-29 National Kaohsiung University Of Applied Sciences 3D Hand Gesture Image Recognition Method and System Thereof
CN108985210A (zh) * 2018-07-06 2018-12-11 常州大学 一种基于人眼几何特征的视线追踪方法及系统
CN109271914A (zh) * 2018-09-07 2019-01-25 百度在线网络技术(北京)有限公司 检测视线落点的方法、装置、存储介质和终端设备
CN110188728A (zh) * 2019-06-06 2019-08-30 四川长虹电器股份有限公司 一种头部姿态估计的方法及系统
CN110909611A (zh) * 2019-10-29 2020-03-24 深圳云天励飞技术有限公司 一种关注区域检测方法、装置、可读存储介质及终端设备
CN111046744A (zh) * 2019-11-21 2020-04-21 深圳云天励飞技术有限公司 一种关注区域检测方法、装置、可读存储介质及终端设备

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103839046B (zh) * 2013-12-26 2017-02-01 苏州清研微视电子科技有限公司 驾驶人注意力自动识别系统及其识别方法
CN105913487B (zh) * 2016-04-09 2018-07-06 北京航空航天大学 一种基于人眼图像中虹膜轮廓分析匹配的视线方向计算方法
WO2018097632A1 (en) * 2016-11-25 2018-05-31 Samsung Electronics Co., Ltd. Method and device for providing an image
CN108875524B (zh) * 2018-01-02 2021-03-02 北京旷视科技有限公司 视线估计方法、装置、系统和存储介质
CN108171218A (zh) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 一种基于深度外观注视网络的视线估计方法
CN108268858B (zh) * 2018-02-06 2020-10-16 浙江大学 一种高鲁棒的实时视线检测方法
CN109492514A (zh) * 2018-08-28 2019-03-19 初速度(苏州)科技有限公司 一种单相机采集人眼视线方向的方法及系统
CN109508679B (zh) * 2018-11-19 2023-02-10 广东工业大学 实现眼球三维视线跟踪的方法、装置、设备及存储介质
CN110046546B (zh) * 2019-03-05 2021-06-15 成都旷视金智科技有限公司 一种自适应视线追踪方法、装置、系统及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104361332A (zh) * 2014-12-08 2015-02-18 重庆市科学技术研究院 一种用于疲劳驾驶检测的人脸眼睛区域定位方法
US20180088671A1 (en) * 2016-09-27 2018-03-29 National Kaohsiung University Of Applied Sciences 3D Hand Gesture Image Recognition Method and System Thereof
CN108985210A (zh) * 2018-07-06 2018-12-11 常州大学 一种基于人眼几何特征的视线追踪方法及系统
CN109271914A (zh) * 2018-09-07 2019-01-25 百度在线网络技术(北京)有限公司 检测视线落点的方法、装置、存储介质和终端设备
CN110188728A (zh) * 2019-06-06 2019-08-30 四川长虹电器股份有限公司 一种头部姿态估计的方法及系统
CN110909611A (zh) * 2019-10-29 2020-03-24 深圳云天励飞技术有限公司 一种关注区域检测方法、装置、可读存储介质及终端设备
CN111046744A (zh) * 2019-11-21 2020-04-21 深圳云天励飞技术有限公司 一种关注区域检测方法、装置、可读存储介质及终端设备

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553928A (zh) * 2021-07-13 2021-10-26 厦门瑞为信息技术有限公司 一种人脸活体检测方法、系统和计算机设备
CN113553928B (zh) * 2021-07-13 2024-03-22 厦门瑞为信息技术有限公司 一种人脸活体检测方法、系统和计算机设备
CN114419738A (zh) * 2022-03-29 2022-04-29 北京市商汤科技开发有限公司 一种姿态检测方法、装置、电子设备以及存储介质
CN116030512A (zh) * 2022-08-04 2023-04-28 荣耀终端有限公司 注视点检测方法及装置
CN116030512B (zh) * 2022-08-04 2023-10-31 荣耀终端有限公司 注视点检测方法及装置
CN115981772A (zh) * 2023-01-10 2023-04-18 呼和浩特市凡诚电子科技有限公司 一种基于大数据的计算机性能控制系统及方法

Also Published As

Publication number Publication date
CN110909611B (zh) 2021-03-05
CN110909611A (zh) 2020-03-24

Similar Documents

Publication Publication Date Title
WO2021082635A1 (zh) 一种关注区域检测方法、装置、可读存储介质及终端设备
CN111046744B (zh) 一种关注区域检测方法、装置、可读存储介质及终端设备
WO2021227726A1 (zh) 面部检测、图像检测神经网络训练方法、装置和设备
US11747898B2 (en) Method and apparatus with gaze estimation
WO2021213067A1 (zh) 物品显示方法、装置、设备及存储介质
CN111192293B (zh) 一种运动目标位姿跟踪方法及装置
CN110570460B (zh) 目标跟踪方法、装置、计算机设备及计算机可读存储介质
CN106709404A (zh) 图像处理装置及图像处理方法
Hernandez et al. Accurate 3D face reconstruction via prior constrained structure from motion
CN110689043A (zh) 一种基于多重注意力机制的车辆细粒度识别方法及装置
CN109858333A (zh) 图像处理方法、装置、电子设备及计算机可读介质
CN112036331A (zh) 活体检测模型的训练方法、装置、设备及存储介质
CN112419326B (zh) 图像分割数据处理方法、装置、设备及存储介质
CN112132032A (zh) 交通标志牌检测方法、装置、电子设备及存储介质
CN112085534B (zh) 一种关注度分析方法、系统及存储介质
CN110689046A (zh) 图像识别方法、装置、计算机装置及存储介质
WO2023202285A1 (zh) 图像处理方法、装置、计算机设备及存储介质
CN110704652A (zh) 基于多重注意力机制的车辆图像细粒度检索方法及装置
CN111860484B (zh) 一种区域标注方法、装置、设备及存储介质
CN112561973A (zh) 训练图像配准模型的方法、装置和电子设备
Xie et al. Event-based stereo matching using semiglobal matching
CN114067428A (zh) 多视角多目标的跟踪方法、装置、计算机设备和存储介质
WO2021082636A1 (zh) 一种关注区域检测方法、装置、可读存储介质及终端设备
CN112906517B (zh) 一种自监督的幂律分布人群计数方法、装置和电子设备
CN114332927A (zh) 课堂举手行为检测方法、系统、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20883151

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20883151

Country of ref document: EP

Kind code of ref document: A1