WO2021098454A1

WO2021098454A1 - Region of concern detection method and apparatus, and readable storage medium and terminal device

Info

Publication number: WO2021098454A1
Application number: PCT/CN2020/124098
Authority: WO
Inventors: 王杉杉; 胡文泽; 王孝宇
Original assignee: 深圳云天励飞技术股份有限公司
Priority date: 2019-11-21
Filing date: 2020-10-27
Publication date: 2021-05-27
Also published as: CN111046744B; CN111046744A

Abstract

A region of concern detection method and apparatus, and a computer readable storage medium and a terminal device. The method comprises: obtaining a target face image to be detected (S101); calculating a head posture of the target face image (S102); extracting a left eye image and a right eye image from the target face image (S103); determining a sight line concern direction according to the left eye image, the right eye image, and the head posture (S104); respectively performing eye key point detection in the left eye image and the right eye image to obtain a coordinate of a left eye pupil center point and a coordinate of a right eye pupil center point (S105); and determining a region of eye concern according to the sight line concern direction, the coordinate of the left eye pupil center point, and the coordinate of the right eye pupil center point (S106). An expensive precision instrument does not need to be used, and a region of eye concern is determined by analyzing and processing a face image, so that costs are greatly reduced, and a wide range of applications can be carried out.

Description

Method, device, readable storage medium and terminal equipment for detecting area of interest

Technical field

The present invention relates to the field of image processing technology, and in particular to a method, device, computer-readable storage medium and terminal equipment for detecting a region of interest.

Background technique

Offline screens mainly include televisions, building screens, cinema ticket machines, cinema LEDs, building screens, supermarket/convenience store POS screens, taxi car screens, projectors, and express screens. Involve all aspects of the user's life. Offline advertising methods based on offline screens have a natural advantage in attracting consumers' attention, but advertisers cannot quickly know whether the design and content of offline advertising are more attractive to consumers, which leads to feedback on whether offline advertising is good or bad. It is often not as fast and accurate as online. These have caused the problem of insufficient accuracy and inefficiency of certain advertisements. In the prior art, precision instruments such as eye trackers can be used to track eye movement and determine the area of interest, but its price is very expensive and it is difficult to be widely used.

Technical solutions

In view of this, the embodiments of the present application provide a method, device, computer-readable storage medium, and terminal device for detecting an area of interest to solve the problem that the existing method for detecting an area of interest is very expensive and difficult to be widely used.

The first aspect of the embodiments of the present application provides a method for detecting a region of interest, which may include:

Acquiring a target face image to be detected;

Calculating the head pose of the target face image;

Extracting a left eye image and a right eye image in the target face image;

Determining the attention direction of the line of sight according to the left-eye image, the right-eye image, and the head posture;

Performing eye key point detection in the left-eye image and the right-eye image, respectively, to obtain the coordinates of the center point of the pupil of the left eye and the coordinates of the center point of the pupil of the right eye;

The eye area of interest is determined according to the attention direction of the line of sight, the coordinates of the center point of the left eye pupil, and the coordinates of the center point of the right eye pupil.

Further, the determining the eye area of interest according to the line of sight attention direction, the coordinates of the center point of the left eye pupil, and the coordinates of the center point of the right eye pupil includes:

Calculating the coordinates of the center points of the pupils of both eyes according to the coordinates of the center point of the pupil of the left eye and the coordinates of the center point of the pupil of the right eye;

Calculating the point-to-plane distance between the center point of the pupils of the two eyes and the preset screen according to the coordinates of the center points of the pupils of the two eyes;

Calculating the coordinates of the eye point of interest according to the attention direction of the line of sight, the coordinates of the center points of the pupils of the two eyes, and the distance between the points and surfaces;

The eye area of interest is determined according to the coordinates of the eye point of interest.

Further, the determining the eye area of interest according to the coordinates of the eye point of interest includes:

Transforming the coordinates of the eye point of interest according to the coordinates of the preset reference pixel point to obtain the pixel position of the eye point of interest on the screen;

Judging whether the pixel position is within the range of the screen according to a preset screen resolution;

If the pixel position is within the range of the screen, determining the screen area where the pixel position is located according to a preset screen area division rule;

The screen area where the pixel position is located is determined as the eye focus area.

Further, the converting the coordinates of the eye focus point according to the coordinates of the preset reference pixel point includes:

The first distance and the second distance are respectively calculated according to the coordinates of the reference pixel point and the coordinates of the eye point of interest. The first distance is that the reference pixel point and the eye point of interest are in the preset first distance. A distance in a coordinate axis direction, where the second distance is a distance between the reference pixel point and the eye focus point in a preset second coordinate axis direction;

Calculating the pixel position of the eye point of interest in the direction of the first coordinate axis according to the first distance and a preset first conversion coefficient;

Calculate the pixel position of the eye point of interest in the direction of the second coordinate axis according to the second distance and a preset second conversion coefficient.

Further, the determining the attention direction of the line of sight according to the left-eye image, the right-eye image, and the head posture includes:

Inputting the left-eye image, the right-eye image, and the head posture into a pre-trained line of sight prediction model for processing to obtain the attention direction of the line of sight;

The processing process of the line-of-sight prediction model includes:

Performing feature information extraction in the left-eye image and the right-eye image respectively to obtain left-eye feature information and right-eye feature information;

Performing fusion processing on the left eye feature information and the right eye feature information to obtain binocular feature information;

The binocular feature information and the head posture are fused to obtain the attention direction of the line of sight.

Further, before inputting the left-eye image, the right-eye image, and the head posture into a pre-trained line-of-sight prediction model for processing, the method may further include:

Construct a training sample set, where the training sample set includes SN training samples, each training sample includes pre-collected left-eye image, right-eye image and head posture of the subject, and each training sample corresponds to The labels are all pre-calibrated directions of attention, and SN is a positive integer;

Use the training sample set to train the line of sight prediction model in the initial state to obtain the pre-trained line of sight prediction model.

Further, after determining the eye area of interest according to the line of sight attention direction, the coordinates of the center point of the left eye pupil, and the coordinates of the center point of the right eye pupil, the method may further include:

Extracting face feature information in the target face image;

Determining user information corresponding to the target face image according to the facial feature information;

Determining the screen display information corresponding to the eye focus area;

Establish a corresponding relationship between the user information and the screen display information.

A second aspect of the embodiments of the present application provides a device for detecting a region of interest, which may include:

The face image acquisition module is used to acquire the target face image to be detected;

A head posture calculation module for calculating the head posture of the target face image;

An eye image extraction module for extracting left eye images and right eye images in the target face image;

A sight attention direction determining module, configured to determine the sight attention direction according to the left-eye image, the right-eye image, and the head posture;

An eye key point detection module, configured to detect key eye points in the left eye image and the right eye image to obtain the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil;

The eye area of interest determination module is configured to determine the eye area of interest according to the eye-focusing direction, the coordinates of the center point of the left eye pupil, and the coordinates of the right eye pupil center point.

Further, the eye attention area determination module may include:

A center point coordinate calculation sub-module, configured to calculate the coordinates of the center point of the pupils of both eyes according to the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil;

A point-to-surface distance calculation sub-module for calculating the point-to-surface distance between the center point of the pupils of the two eyes and a preset screen according to the coordinates of the center points of the pupils of the two eyes;

An eye point of interest coordinate calculation sub-module, configured to calculate the coordinates of the eye point of interest according to the attention direction of the line of sight, the coordinates of the center points of the pupils of the two eyes, and the distance between the points;

The eye area of interest determination sub-module is used to determine the eye area of interest according to the coordinates of the eye point of interest.

Further, the sub-module for determining the eye area of interest may include:

A coordinate conversion unit, configured to convert the coordinates of the eye point of interest according to the coordinates of a preset reference pixel point to obtain the pixel position of the eye point of interest on the screen;

The pixel position determining unit is configured to determine whether the pixel position is within the range of the screen according to a preset screen resolution;

A screen area determining unit, configured to determine the screen area where the pixel position is located according to a preset screen area division rule if the pixel position is within the range of the screen;

The eye attention area determining unit is configured to determine the screen area where the pixel position is located as the eye attention area.

Further, the coordinate conversion unit may include:

The distance calculation subunit is configured to calculate a first distance and a second distance respectively according to the coordinates of the reference pixel point and the coordinates of the eye point of interest, where the first distance is the reference pixel point and the eye The distance of the point of interest in the preset first coordinate axis direction, where the second distance is the distance between the reference pixel point and the eye point of interest in the preset second coordinate axis direction;

A first pixel position calculation subunit, configured to calculate the pixel position of the eye point of interest in the direction of the first coordinate axis according to the first distance and a preset first conversion coefficient;

The second pixel position calculation subunit is configured to calculate the pixel position of the eye point of interest in the direction of the second coordinate axis according to the second distance and a preset second conversion coefficient.

Further, the line-of-sight attention direction determining module is specifically configured to input the left-eye image, the right-eye image, and the head posture into a pre-trained line-of-sight prediction model for processing to obtain the line-of-sight attention direction ；

The sight attention direction determining module may include:

The feature information extraction sub-module is configured to extract feature information from the left-eye image and the right-eye image to obtain left-eye feature information and right-eye feature information;

The binocular feature information determining sub-module is configured to perform fusion processing on the left eye feature information and the right eye feature information to obtain binocular feature information;

The gaze attention direction determination sub-module is used to perform fusion processing on the binocular feature information and the head posture to obtain the gaze attention direction.

Further, the device for detecting a region of interest may further include:

The sample set construction module is used to construct a training sample set, wherein the training sample set includes SN training samples, and each training sample includes pre-collected left-eye images, right-eye images and head poses of the subject, And the label corresponding to each training sample is the pre-calibrated line of sight attention direction, and SN is a positive integer;

The model training module is used to train the line-of-sight prediction model in the initial state by using the training sample set to obtain the pre-trained line-of-sight prediction model.

Further, the device for detecting a region of interest may further include:

The facial feature information extraction module is used to extract the facial feature information in the target face image;

A user information determining module, configured to determine user information corresponding to the target face image according to the facial feature information;

The screen display information determining module is used to determine the screen display information corresponding to the eye focus area;

The correspondence relationship establishment module is used to establish the correspondence relationship between the user information and the screen display information.

The third aspect of the embodiments of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps of any of the above-mentioned region-of-interest detection methods are implemented .

The fourth aspect of the embodiments of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and running on the processor. When the processor executes the computer program, Steps for implementing any of the above-mentioned methods for detecting a region of interest.

The fifth aspect of the embodiments of the present application provides a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the steps of any of the above-mentioned region-of-interest detection methods.

Compared with the prior art, the embodiment of the application has the following beneficial effects: the embodiment of the application obtains the target face image to be detected; calculates the head posture of the target face image; extracts the Left-eye image and right-eye image; determine the attention direction of the line of sight according to the left-eye image, the right-eye image, and the head posture; perform eye key points in the left-eye image and the right-eye image, respectively Through detection, the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil are obtained; the eye focus area is determined according to the focus direction of the line of sight, the coordinates of the left eye pupil center point and the coordinates of the right eye pupil center point . In the embodiments of this application, there is no need to use expensive precision instruments, but through image analysis and processing of the face image, the eye-focus direction, the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil are obtained respectively, and In this way, the eye area is determined, which greatly reduces the cost and can be used in a wider range of applications.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only of the present application. For some embodiments, those of ordinary skill in the art can obtain other drawings based on these drawings without creative labor.

FIG. 1 is a flowchart of an embodiment of a method for detecting a region of interest in an embodiment of the application;

Figure 2 is a schematic diagram of a 3D coordinate system established in an embodiment of the application;

Figure 3 is a schematic diagram of the network structure of the line-of-sight prediction model;

Figure 4 is a schematic diagram of the attention direction of the line of sight;

Fig. 5 is a schematic diagram of the center point of the pupil;

Fig. 6 is a schematic flow chart of determining the eye area of interest according to the attention direction of the line of sight, the coordinates of the center point of the left eye pupil, and the coordinates of the right eye pupil center point;

Figure 7 is a schematic diagram of calculating the coordinates of the eye point of interest;

FIG. 8 is a schematic flowchart of determining the eye area of interest according to the coordinates of the eye point of interest;

FIG. 9 is a structural diagram of an embodiment of a device for detecting a region of interest in an embodiment of the application;

FIG. 10 is a schematic block diagram of a terminal device in an embodiment of this application.

Embodiments of the present invention

Referring to FIG. 1, an embodiment of a method for detecting a region of interest in an embodiment of the present application may include:

Step S101: Obtain a target face image to be detected.

In the embodiment of the present application, in order to determine the user's attention area on the screen, a depth camera may be configured for the screen. The depth camera may be built into the screen or used as an external device of the screen.

In the embodiment of the present application, the camera coordinate system of the depth camera can be used to establish a 3D coordinate system as shown in FIG. 2, and the upper left (that is, left_up in FIG. 2) and upper right (that is, the left_up in FIG. 2) of the screen can be pre-calibrated. The coordinates of the 4 corner points in the 3D coordinate system, the right_up in the middle, the lower left (that is, the left_bottom in Fig. 2), and the lower right (that is, the right_bottom in Fig. 2).

The execution subject of the embodiments of this application may be a terminal device connected to the screen in a wired or wireless manner, including but not limited to desktop computers, notebooks, palmtop computers, smart phones, servers, and other terminals with data processing functions. equipment. In particular, if the screen is a smart screen with a data processing function, it can also be used as a terminal device for executing the embodiments of the present application without relying on other external terminal devices.

In a specific implementation of the embodiment of the present application, an image around the screen can be collected by the depth camera, and face detection can be performed in the image. If a face is detected, the current face image can be captured , That is, the target face image.

Step S102: Calculate the head posture of the target face image.

After the target face image is acquired, 3D key points of the face can be detected on it, and the head pose of the target face image can be calculated according to these 3D key points.

In a specific implementation of the embodiment of the present application, an Iterative Closest Point (ICP) algorithm may be used to calculate the head posture. Specifically, a reference point cloud image as a comparison reference is preset, and each 3D key point as a comparison reference is included in the reference point cloud image, and then the detected 3D key points are constructed as the target face image. The point cloud image uses the nearest neighbor criterion to determine the corresponding points in the two point cloud images, and determines the conversion function between the two through the least square method, and uses this conversion function to rotate the point cloud image of the target face image, The updated point cloud image of the target face image is obtained, and the above process is repeated until the preset termination condition is reached, and the iteration is stopped. Finally, the angle of rotation in each iteration of the calculation process is superimposed, and the result obtained is the head pose of the target face image.

It should be noted that the calculation process of the head posture above is only an example. In practical applications, any head posture calculation method in the prior art can also be selected according to the specific situation, which is not specifically limited in the embodiment of the present application. .

Here, the calculated head posture is denoted as headpose[theta, phi], where theta represents the upward or downward looking angle of the head, and phi represents the deflection angle of the head in the horizontal direction.

Step S103: Extract a left eye image and a right eye image in the target face image.

Taking the extraction process of the left-eye image as an example, in a specific implementation of the embodiment of the present application, the left-eye key points can be filtered out from the detected 3D key points. Here, the abscissas of these left-eye key points The minimum value is left_x_min, the maximum value of the abscissa is left_x_max, the minimum value of the ordinate is left_y_min, and the maximum value of the ordinate is left_y_max. Then the rectangular area (denoted as LA1) formed by the following four coordinate points The image is used as the left-eye image: (left_x_min, left_y_max), (left_x_min, left_y_min), (left_x_max, left_y_max), (left_x_max, left_y_min). Further, considering that directly using the maximum value information to intercept the left-eye image may have the phenomenon of missing edge information, it is possible to further expand LA1 to obtain a new rectangular area LA2, and use the image in LA2 as the Left eye image. The extraction process of the right-eye image is similar to the extraction process of the left-eye image, and will not be repeated here.

It should be noted that the above eye image extraction process is only an example. In practical applications, any eye image extraction method in the prior art can also be selected according to specific conditions, which is not specifically limited in the embodiment of the present application.

Step S104: Determine the attention direction of the line of sight according to the left-eye image, the right-eye image and the head posture.

In the embodiment of the present application, the left-eye image, the right-eye image, and the head posture may be input into a pre-trained line of sight prediction model for processing, so as to obtain the attention direction of the line of sight.

As shown in Figure 3, the line of sight prediction model uses a multi-input neural network structure. The line of sight prediction model first extracts feature information from the left-eye image and the right-eye image to obtain the left-eye image. Eye feature information and right eye feature information, then the left eye feature information and the right eye feature information are fused to obtain binocular feature information, and finally the binocular feature information and the head posture are fused to process, Obtain the attention direction of the line of sight, as shown in FIG. 4.

It should be noted that, in the embodiment of the present application, the default attention direction of the eyes of both eyes is the same, and there is no case of eye-to-eye.

The specific processing process of the line-of-sight prediction model is now described in conjunction with Figure 3 as follows:

For the left eye image, use the ResNet18 block (ie, the ResNet18 Block in Figure 3) to extract feature information from the left eye image (ie, the Left eye in Figure 3), and then perform the extracted feature information in turn Average pooling processing (ie Avg_pooling in Figure 3), fully connected layer processing (ie FC_Left in Figure 3), batch normalization processing (ie BN_Left in Figure 3), and activation function processing (ie Relu_Left in Figure 3) ) To obtain the left-eye feature information.

For the right eye image, the ResNet18 block (ie ResNet18 Block in Figure 3) is used to extract feature information from the right eye image (ie Right eye in Figure 3), and then the extracted feature information is sequentially performed Average pooling processing (ie Avg_pooling in Figure 3), fully connected layer processing (ie FC_Right in Figure 3), batch normalization processing (ie BN_Right in Figure 3), and activation function processing (ie Relu_Right in Figure 3) ) To obtain the right-eye characteristic information.

After the left-eye feature information and the right-eye feature information are obtained separately, the left-eye feature information and the right-eye feature information are spliced (ie EyesConcat in FIG. 3), and the two are merged, Then, the fused information is processed by the fully connected layer (ie EyesFc1 in FIG. 3) to obtain the binocular feature information.

After the binocular feature information is obtained, the binocular feature information and the head pose (ie HeadPose in Figure 3) are spliced (ie HeadConcat in Figure 3), the two are merged, and then the fusion is performed The latter information is subjected to batch normalization processing (ie BN_Head in Figure 3), activation function processing (ie Relu_Head in Figure 3), and fully connected layer processing (ie Fc_Head in Figure 3) to obtain the attention direction of the line of sight (That is, Gaze in Figure 3).

Here, the calculated attention direction of the line of sight is recorded as: Gaze[gaze_theta, gaze_phi], where gaze_theta represents the upward or downward looking angle of the line of sight, and gaze_phi represents the deflection angle of the line of sight in the horizontal direction.

For the convenience of subsequent calculations, the attention direction of the line of sight can be converted from angle form to vector form according to the following formula:

vectorx = cos(gaze_theta)*sin(gaze_phi)

vectory = sin(gaze_theta)

vectorz = cos(gaze_theta)*cos(gaze_phi)

init_vector=( vector, vectory, vectorz)

Wherein, init_vector is the attention direction of the line of sight in vector form, and its components on the x-axis, y-axis, and z-axis are: vectorx, vectory, and vectorz, respectively.

Preferably, it is also possible to normalize the attention direction of the sight line in the form of a vector according to the following formula to obtain the normalized vector of the attention direction of the sight line:

norm = sqrt(vectorx ² + vectory ² + vectorz ² )

gaze_vector= init_vector/norm

Among them, norm is the modulus of the attention direction of the line of sight, gaze_vector is the normalized vector of the attention direction of the line of sight, and its components on the x-axis, y-axis, and z-axis are: gaze_vector[x], gaze_vector[y] And gaze_vector[z].

Through the processing process shown in Figure 3, the left-eye feature information, right-eye feature information, and head posture information are fused together for comprehensive consideration. Based on this, the attention direction of the line of sight is predicted, which greatly improves the final The accuracy of the forecast results.

Preferably, before step S104, a training sample set may be constructed in advance, and the initial state line-of-sight prediction model can be trained using the training sample set to obtain the pre-trained line-of-sight prediction model.

Wherein, the training sample set includes SN training samples, each training sample includes pre-collected left-eye image, right-eye image and head posture of the subject, and the label corresponding to each training sample is pre-calibrated SN is a positive integer.

The training process of the neural network is a commonly used technology in the prior art, and for details, reference may be made to any neural network training method in the prior art, which will not be repeated in the embodiment of the present application.

Through this training process, a large number of actual test samples are collected in advance to construct a training sample set, and the sight line prediction model is trained on the basis of these measured data, so that the final line of sight prediction model is more in line with the actual situation. Based on this, the attention direction of the line of sight obtained by the detection has a higher accuracy rate.

Step S105: Perform eye key point detection in the left-eye image and the right-eye image, respectively, to obtain the coordinates of the center point of the left eye pupil and the coordinates of the right eye pupil center point.

In the embodiment of the present application, it is preferable to use an eye fixed-point model (EyeLandMark Model, ELM) to perform eye key point detection in the left eye image and the right eye image respectively, so as to obtain the coordinates of the center point of the left eye pupil ( Denoted as left_iris_center) and the coordinates of the center point of the right eye pupil (denoted as right_iris_center), as shown in Figure 5. It should be noted that since the depth camera is used in this application, the obtained coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil are all three-dimensional coordinates, which can be denoted as ( x _left , y _left , z _left ) and (x _right , y _right , z _right ), where x _left , y _left , and z _left are respectively the center point of the left eye pupil on the x-axis, y-axis, and z-axis The coordinates of x _right , y _right , and z _right are the coordinates of the center point of the right eye pupil on the x-axis, y-axis, and z-axis, respectively.

Step S106: Determine an eye area of interest according to the line of sight attention direction, the coordinates of the center point of the left eye pupil, and the coordinates of the right eye pupil center point.

In a specific implementation of the embodiment of the present application, step S106 may specifically include the process shown in FIG. 6:

Step S1061: Calculate the coordinates of the center points of the pupils of both eyes according to the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil.

Specifically, the coordinates of the center points of the pupils of the two eyes can be calculated according to the following formula:

middle_pos=(left_iris_center+right_iris_center)/2

Among them, middle_pos is the coordinates of the center point of the pupils of the eyes, and middle_pos = (x _middle , y _middle , z _middle ), x _middle , y _middle , and z _middle are the pupil centers of the eyes on the x axis and y axis , The coordinates on the z axis.

Step S1062: Calculate the point-to-plane distance between the center point of the pupils of the eyes and the preset screen according to the coordinates of the center points of the pupils of the eyes.

Here, the normal vector of the plane on which the screen is located can be denoted as n=(A, B, C), and the point-to-plane distance between the center point of the pupils of the two eyes and the screen can be calculated according to the following formula:

iris_distance=(A*x _middle +B* y _middle +C* z _middle )/sqrt(A ² + B ² +C ² )

Wherein, sqrt is a square root function, and iris_distance is the distance between the center points of the pupils of the eyes and the screen.

Step S1063: Calculate the coordinates of the eye-focused point according to the eye-focusing direction, the coordinates of the center points of the pupils of the two eyes, and the point-to-surface distance.

The eye focus point is the projection point of the line of sight on the screen. Specifically, the coordinates of the eye point of interest can be calculated according to the following formula:

project_3d = middle_pos + gaze_vector*(iris_distance/gaze_vector[z])

Wherein, project_3d is the coordinate of the point of interest of the eye.

FIG. 7 shows a schematic diagram of calculating the coordinates of the eye point of interest. Through this process, accurate calculation of the coordinates of the eye focus point is achieved based on the geometric spatial position relationship, and the eye focus area determined on this basis has a higher accuracy rate.

Step S1064: Determine the eye focus area according to the coordinates of the eye focus point.

In a specific implementation of the embodiment of the present application, step S1064 may specifically include the process shown in FIG. 8:

Step S10641: Transform the coordinates of the eye point of interest according to the coordinates of the preset reference pixel point to obtain the pixel position of the eye point of interest on the screen.

The reference pixel point may be any one of the four corner points shown in FIG. 2, and here, the upper left corner point is preferably determined as the reference pixel point.

When performing coordinate conversion, the first distance and the second distance may be calculated respectively according to the coordinates of the reference pixel point and the coordinates of the eye point of interest, and then according to the first distance and the preset first conversion coefficient Calculate the pixel position of the eye point of interest in the first coordinate axis direction, and calculate the eye point of interest in the second coordinate axis direction according to the second distance and a preset second conversion coefficient The pixel position on the top.

The first distance is the distance between the reference pixel point and the eye focus point in the direction of the preset first coordinate axis (that is, the x-axis in FIG. 2); the second distance is the reference pixel The distance between the point and the eye focus point in the direction of the preset second coordinate axis (ie, the y axis in FIG. 2). The first conversion coefficient is the number of pixels included in the distance of each unit length in the direction of the first coordinate axis; the second conversion coefficient is the number of pixels per unit length in the direction of the second coordinate axis. The number of pixels included in the length of the distance.

The specific coordinate conversion formula is as follows:

project_pixel[x] = (project_3d[x]-left_up[x])*scalex

project_pixel[y] = (project_3d[y]- left_up[y])*scaley

Wherein, project_3d[x] is the coordinate of the eye focus point in the direction of the first coordinate axis, and left_up[x] Is the coordinate of the reference pixel in the direction of the first coordinate axis, scalex is the first conversion coefficient, and project_pixel[x] is the pixel position of the eye point of interest in the direction of the first coordinate axis , Project_3d[y] is the coordinate of the eye focus point in the direction of the second coordinate axis, left_up[y] is the coordinate of the reference pixel point in the direction of the second coordinate axis, and scaley is the The second conversion coefficient, project_piyel[y] is the pixel position of the eye focus point in the direction of the second coordinate axis.

Through the above coordinate conversion process, the pixel position of the eye focus point on the screen is obtained, and the eye focus area can be accurately determined based on this.

Step S10642: Determine whether the pixel position is within the range of the screen according to the preset screen resolution.

Here, the screen resolution of the screen is recorded as: MaxX* MaxY, if the pixel position satisfies: 0< project_pixel[x] <MaxX and 0< project_pixel[y] <MaxY, it can be determined that the pixel position is within the range of the screen, otherwise, it can be determined that the pixel position is not within the range of the screen.

If the pixel position is not within the range of the screen, it means that the user is not paying attention to the content on the screen, and no subsequent processing is required at this time; if the pixel position is within the range of the screen, continue execution Next steps.

Step S10643: Determine the screen area where the pixel position is located according to a preset screen area division rule.

Step S10644: Determine the screen area where the pixel position is located as the eye focus area.

In the embodiment of this application, the screen can be divided into KN (KN is an integer greater than 1) screen areas in advance, which are recorded in the order from top to bottom and from left to right as: screen area 1, screen area 2...., screen area k,..., screen area KN, where 1≤k≤KN, if the pixel position falls within the range of screen area 1, then screen area 1 can be determined as the eye focus area , If the pixel position falls within the range of the screen area 2, the screen area 2 can be determined as the eye focus area,..., if the pixel location falls within the range of the screen area k, the screen can be The area k is determined as the eye attention area, ..., if the pixel position falls within the range of the screen area KN, the screen area KN may be determined as the eye attention area.

By pre-setting the range of each screen area, after calculating the pixel position of the eye focus on the screen, it is only necessary to determine which screen area the pixel position belongs to, and then the corresponding eye can be determined The area of interest has a very small amount of calculation, which greatly improves the efficiency of the area of interest detection.

Further, after the eye area of interest is determined, facial feature information in the target face image can be extracted, and user information corresponding to the target face image can be determined according to the facial feature information , This user information includes but is not limited to age, gender, etc.

In a specific application of the embodiment of the present application, each divided screen area can be used to display different information, including but not limited to advertisements, news, announcements, and so on. After the eye focus area is determined, the screen display information corresponding to the eye focus area can be further determined, and the corresponding relationship between the user information and the screen display information can be established.

In this way, you can collect a lot of statistical data, for example, how many times each screen display information has been followed by the user, and what type of each screen display information (user types can be divided according to age, gender, etc.) User attention, etc., use these statistical data as the basis for the replacement and delivery of screen display information, which greatly improves the accuracy and efficiency of screen display information delivery.

In summary, the embodiment of the application obtains the target face image to be detected; calculates the head posture of the target face image; extracts the left eye image and the right eye image in the target face image; The left-eye image, the right-eye image, and the head posture determine the attention direction of the line of sight; eye key point detection is performed in the left-eye image and the right-eye image, respectively, to obtain the coordinates of the center point of the left eye pupil and The coordinates of the center point of the right eye pupil; the eye area of interest is determined according to the line of sight attention direction, the coordinates of the left eye pupil center point, and the coordinates of the right eye pupil center point. In the embodiments of this application, there is no need to use expensive precision instruments, but through image analysis and processing of the face image, the eye-focus direction, the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil are obtained respectively, and In this way, the eye area is determined, which greatly reduces the cost and can be used in a wider range of applications.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

Corresponding to the method for detecting a region of interest described in the above embodiment, FIG. 9 shows a structural diagram of an embodiment of a device for detecting a region of interest provided in an embodiment of the present application.

In this embodiment, a device for detecting a region of interest may include:

The face image acquisition module 901 is used to acquire the target face image to be detected;

The head posture calculation module 902 is used to calculate the head posture of the target face image;

The eye image extraction module 903 is configured to extract the left eye image and the right eye image in the target face image;

The sight attention direction determining module 904 is configured to determine the sight attention direction according to the left-eye image, the right-eye image, and the head posture;

The eye key point detection module 905 is configured to perform eye key point detection in the left eye image and the right eye image to obtain the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil;

The eye area of interest determination module 906 is configured to determine the eye area of interest according to the eye-focusing direction, the coordinates of the center point of the left eye pupil, and the coordinates of the center point of the right eye pupil.

Further, the eye attention area determination module may include:

Further, the sub-module for determining the eye area of interest may include:

Further, the coordinate conversion unit may include:

The sight attention direction determining module may include:

Further, the device for detecting a region of interest may further include:

Those skilled in the art can clearly understand that, for the convenience and conciseness of the description, the specific working processes of the above described devices, modules and units can refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.

In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

FIG. 10 shows a schematic block diagram of a terminal device provided by an embodiment of the present application. For ease of description, only parts related to the embodiment of the present application are shown.

As shown in FIG. 10, the terminal device 10 of this embodiment includes: a processor 100, a memory 101, and a computer program 102 stored in the memory 101 and running on the processor 100. When the processor 100 executes the computer program 102, the steps in the foregoing embodiments of the region of interest detection method are implemented, for example, step S101 to step S106 shown in FIG. 1. Alternatively, when the processor 100 executes the computer program 102, the functions of the modules/units in the foregoing device embodiments, for example, the functions of the modules 901 to 906 shown in FIG. 9 are realized.

Exemplarily, the computer program 102 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 101 and executed by the processor 100 to complete This application. The one or more modules/units may be a series of computer program instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer program 102 in the terminal device 10.

The terminal device 10 may be a computing device such as a desktop computer, a notebook, a palmtop computer, a smart phone, a server, and a smart screen. Those skilled in the art can understand that FIG. 10 is only an example of the terminal device 10, and does not constitute a limitation on the terminal device 10. It may include more or less components than shown in the figure, or a combination of certain components, or different components. For example, the terminal device 10 may also include an input/output device, a network access device, a bus, and the like.

The processor 100 may be a central processing unit (Central Processing Unit, CPU), it can also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), and application-specific integrated circuits (Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The processor 100 may be the nerve center and command center of the terminal device 10, and the processor 100 may generate operation control signals according to instruction operation codes and timing signals, and complete the control of fetching instructions and executing instructions.

The memory 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) equipped on the terminal device 10. Flash card Card) and so on. Further, the memory 101 may also include both an internal storage unit of the terminal device 10 and an external storage device. The memory 101 is used to store the computer program and other programs and data required by the terminal device 10. The memory 101 can also be used to temporarily store data that has been output or will be output.

The terminal device 10 may also include a communication module, and the communication module may provide applications on a network device including a wireless local area network (Wireless Local Area Networks, WLAN) (such as Wi-Fi networks), Bluetooth, Zigbee, mobile communication networks, Global Navigation Satellite System (GNSS), Frequency Modulation (Frequency Modulation, FM), Near Field Communication Technology (Near Field Communication, NFC), infrared technology (Infrared, IR) and other communication solutions. The communication module may be one or more devices integrating at least one communication processing module. The communication module may include an antenna, and the antenna may have only one array element or an antenna array including multiple array elements. The communication module can receive electromagnetic waves through an antenna, frequency-modulate and filter the electromagnetic wave signals, and send the processed signals to the processor. The communication module can also receive the signal to be sent from the processor, perform frequency modulation and amplification, and convert it into electromagnetic waves to radiate through the antenna.

The terminal device 10 may also include a power management module, which can receive input from an external power source, a battery, and/or a charger, and supply power to the processor, the memory, the communication module, and the like.

The terminal device 10 may also include a display module, which may be used to display information input by the user or information provided to the user. The display module may include a display panel. Optionally, a liquid crystal display (Liquid Crystal Display, LCD), Organic Light-Emitting Diode (OLED) and other forms to configure the display panel. Further, the touch panel may cover the display panel. When the touch panel detects a touch operation on or near it, it is transmitted to the processor to determine the type of the touch event, and then the processor is based on the type of the touch event. Provide corresponding visual output on the display panel.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of software functional units. In addition, the specific names of the functional units and modules are only used to facilitate distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

In the embodiments provided in this application, it should be understood that the disclosed device/terminal device and method may be implemented in other ways. For example, the device/terminal device embodiments described above are merely illustrative. For example, the division of the modules or units is only a logical function division, and there may be other divisions in actual implementation, such as multiple units. Or components can be combined or integrated into another system, or some features can be omitted or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

The embodiments of the present application provide a computer program product. When the computer program product runs on the terminal device, the terminal device can implement the steps in the foregoing method embodiments.

If the integrated module/unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiments and methods, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the program is executed by the processor, it can implement the steps of the foregoing method embodiments. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. It should be noted that the content contained in the computer-readable medium can be appropriately added or deleted according to the requirements of the legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to the legislation and patent practice, the computer-readable medium Does not include electrical carrier signals and telecommunication signals.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A method for detecting a region of interest, which is characterized in that it includes:

Acquiring a target face image to be detected;

Calculating the head pose of the target face image;

Extracting a left eye image and a right eye image in the target face image;

Determining the attention direction of the line of sight according to the left-eye image, the right-eye image, and the head posture;

Performing eye key point detection in the left-eye image and the right-eye image, respectively, to obtain the coordinates of the center point of the pupil of the left eye and the coordinates of the center point of the pupil of the right eye;

The eye area of interest is determined according to the attention direction of the line of sight, the coordinates of the center point of the left eye pupil, and the coordinates of the center point of the right eye pupil.
The method for detecting an area of interest according to claim 1, wherein the determining the eye area of interest according to the attention direction of the line of sight, the coordinates of the center point of the left eye pupil, and the coordinates of the center point of the right eye pupil comprises :

Calculating the coordinates of the center points of the pupils of both eyes according to the coordinates of the center point of the pupil of the left eye and the coordinates of the center point of the pupil of the right eye;

Calculating the point-to-plane distance between the center point of the pupils of the two eyes and the preset screen according to the coordinates of the center points of the pupils of the two eyes;

Calculating the coordinates of the eye point of interest according to the attention direction of the line of sight, the coordinates of the center points of the pupils of the two eyes, and the distance between the points and surfaces;

The eye area of interest is determined according to the coordinates of the eye point of interest.
The method for detecting a region of interest according to claim 2, wherein the determining the region of interest of the eye according to the coordinates of the point of interest of the eye comprises:

Transforming the coordinates of the eye point of interest according to the coordinates of the preset reference pixel point to obtain the pixel position of the eye point of interest on the screen;

Judging whether the pixel position is within the range of the screen according to a preset screen resolution;

If the pixel position is within the range of the screen, determining the screen area where the pixel position is located according to a preset screen area division rule;

The screen area where the pixel position is located is determined as the eye focus area.
The method for detecting a region of interest according to claim 3, wherein the transforming the coordinates of the eye point of interest according to the coordinates of a preset reference pixel point comprises:

The first distance and the second distance are respectively calculated according to the coordinates of the reference pixel point and the coordinates of the eye point of interest. The first distance is that the reference pixel point and the eye point of interest are in the preset first distance. A distance in a coordinate axis direction, where the second distance is a distance between the reference pixel point and the eye focus point in a preset second coordinate axis direction;

Calculating the pixel position of the eye point of interest in the direction of the first coordinate axis according to the first distance and a preset first conversion coefficient;

Calculate the pixel position of the eye point of interest in the direction of the second coordinate axis according to the second distance and a preset second conversion coefficient.
The method for detecting a region of interest according to claim 1, wherein the determining the attention direction of the line of sight according to the left-eye image, the right-eye image, and the head posture comprises:

Inputting the left-eye image, the right-eye image, and the head posture into a pre-trained line of sight prediction model for processing to obtain the attention direction of the line of sight;

The processing process of the line-of-sight prediction model includes:

Performing feature information extraction in the left-eye image and the right-eye image respectively to obtain left-eye feature information and right-eye feature information;

Performing fusion processing on the left eye feature information and the right eye feature information to obtain binocular feature information;

The binocular feature information and the head posture are fused to obtain the attention direction of the line of sight.
The method for detecting a region of interest according to any one of claims 1 to 5, characterized in that it is determined according to the attention direction of the line of sight, the coordinates of the center point of the left eye pupil, and the coordinates of the center point of the right eye pupil. After focusing on the eye area, it also includes:

Extracting face feature information in the target face image;

Determining user information corresponding to the target face image according to the facial feature information;

Determining the screen display information corresponding to the eye focus area;

Establish a corresponding relationship between the user information and the screen display information.
A detection device for a region of interest, which is characterized in that it comprises:

The face image acquisition module is used to acquire the target face image to be detected;

A head posture calculation module for calculating the head posture of the target face image;

An eye image extraction module for extracting left eye images and right eye images in the target face image;

A sight attention direction determining module, configured to determine the sight attention direction according to the left-eye image, the right-eye image, and the head posture;

An eye key point detection module, configured to detect key eye points in the left eye image and the right eye image to obtain the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil;

The eye area of interest determination module is configured to determine the eye area of interest according to the eye-focusing direction, the coordinates of the center point of the left eye pupil, and the coordinates of the right eye pupil center point.
The device for detecting a region of interest according to claim 7, wherein the module for determining the region of interest of the eye comprises:

A center point coordinate calculation sub-module, configured to calculate the coordinates of the center point of the pupils of both eyes according to the coordinates of the center point of the left eye pupil and the coordinates of the center point of the right eye pupil;

A point-to-surface distance calculation sub-module for calculating the point-to-surface distance between the center point of the pupils of the two eyes and a preset screen according to the coordinates of the center points of the pupils of the two eyes;

An eye point of interest coordinate calculation sub-module, configured to calculate the coordinates of the eye point of interest according to the attention direction of the line of sight, the coordinates of the center points of the pupils of the two eyes, and the distance between the points;

The eye area of interest determination sub-module is used to determine the eye area of interest according to the coordinates of the eye point of interest.
A computer-readable storage medium, the computer-readable storage medium stores a computer program, wherein the computer program is executed by a processor to realize the area of interest detection according to any one of claims 1 to 6 Method steps.
A terminal device, comprising a memory, a processor, and a computer program stored in the memory and running on the processor, wherein the processor executes the computer program as claimed in claims 1 to Steps of the region of interest detection method described in any one of 6.