CN114610150A

CN114610150A - Image processing method and device

Info

Publication number: CN114610150A
Application number: CN202210226571.1A
Authority: CN
Inventors: 卞琛毓
Original assignee: Shanghai Hode Information Technology Co Ltd
Current assignee: Shanghai Hode Information Technology Co Ltd
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-06-10

Abstract

The application provides an image processing method, which comprises the following steps: acquiring an eye image of a target user; determining a first fixation position of a target user on a display interface according to the eye image; determining a first area corresponding to the first gaze position in the image to be displayed and a second area except the first area in the image to be displayed according to the first gaze position; rendering the first area and the second area respectively to obtain a first rendering image and a second rendering image, and combining the first rendering image and the second rendering image to obtain a first target image, wherein the quality of the first rendering image is higher than that of the second rendering image. According to the technical method, the attention point is determined according to the eye image, so that high-quality rendering is performed on the image near the attention point, low-quality rendering is performed on the image except the attention point, the hardware performance requirement and pressure are reduced while the watching experience of a user is guaranteed, complex eye movement tracking equipment is not needed, and the cost is low.

Description

Image processing method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, a computer device, and a computer-readable storage medium.

Background

As the quality of video is improved with the technological advance, the resolution of mainstream video has reached 4K (4096 × 2160, for example) level, and even some devices and video sources have supported 8K resolution (7680 × 4320, for example). Although the improvement of video quality can well improve the viewing experience of users, with the continuous improvement of high resolution, the requirement for the hardware capability of the device is higher and higher. This is because: first, from 1080p to 4K and 8K, each time the image quality is improved, the amount of pixels to be rendered is about 4 times of the previous one. Secondly, high-resolution video does not have low real-time requirements, and for more detailed video, the viewing experience of the user is affected if the fluency is not sufficient.

Disclosure of Invention

The application aims to provide an image processing method, an image processing device, a computer device and a computer readable storage medium, which are used for solving the following technical problems: rendering of high resolution images is difficult to compromise between image quality, required processing performance and user viewing experience.

An aspect of an embodiment of the present application provides an image processing method, including:

acquiring a current frame of eye image of a target user through a camera;

acquiring an eye image of a target user, wherein the eye image is a continuous multi-frame image;

determining a first fixation position of the target user on a display interface according to the eye image;

determining a first area corresponding to the first gaze position in an image to be displayed and a second area except the first area in the image to be displayed according to the first gaze position;

rendering the first area and the second area in the image to be displayed respectively to obtain a first rendering image and a second rendering image, and combining the first rendering image and the second rendering image to obtain a first target image, wherein the image quality of the first rendering image is higher than that of the second rendering image.

Optionally, the acquiring an eye image of a target user includes:

acquiring a face image of the target user;

carrying out binarization processing on the face image by using at least two different thresholds to obtain at least two binarized face images;

and determining the eye image of the target user according to the at least two binarization face images.

Optionally, the determining the eye image of the target user according to at least two of the binarized face images includes:

determining a human eye pupil region according to the circular regions in at least two binarization human face images;

determining the eye image according to the human eye pupil region.

Optionally, the determining a first gaze location of the target user on a display interface according to the eye image includes:

processing the eye image of the current frame by using a pre-trained gaze position prediction model to obtain the eye pupil movement offset of the target user of the current frame;

determining the initial predicted gaze position of the target user of the current frame according to the initial predicted gaze position of the target user of the previous frame and the human eye pupil movement offset of the target user of the current frame;

and determining the first gaze position of the target user of the current frame according to the initial predicted gaze position of the target user of the current frame and the predetermined eyeball states of a plurality of preset calibration points of the target user on the display interface.

Optionally, the determining the first gaze location of the target user of the current frame according to the initial predicted gaze location of the target user of the current frame and a predetermined eyeball state of the target user at a plurality of preset calibration points on the display interface includes:

determining the eyeball state of the current frame of the target user according to the eye image of the current frame of the target user;

determining at least three preset calibration points adjacent to the initial predicted fixation position according to the eyeball state of the current frame of the target user and the predetermined eyeball states of a plurality of preset calibration points of the target user on the display interface;

and determining the first gaze location according to the initial predicted gaze location and at least three of the preset index points adjacent to the initial predicted gaze location.

Optionally, the determining the first gaze location according to the initial predicted gaze location and at least three of the preset calibration points adjacent to the initial predicted gaze location includes:

and adjusting the initial predicted gaze position of the target user of the current frame, and taking the adjusted initial predicted gaze position as the first gaze position when the sum of the distances between the adjusted initial predicted gaze position and the at least three preset calibration points reaches a minimum.

Optionally, before acquiring the eye image of the target user, the method further comprises:

acquiring an eyeball image of the target user when the preset calibration point on the display interface is taken as a fixation position;

and determining the eyeball state of the target user when the target user watches the preset calibration point on the display interface according to the eyeball image of the target user when the preset calibration point on the display interface is taken as the gazing position.

Optionally, the determining, according to the eye image, a first gaze location of the target user on a display interface further includes:

and determining an initial predicted fixation position of the first frame of the target user according to the predetermined eyeball states of the target user viewing a plurality of preset calibration points on the display interface and the eyeball image of the first frame of the target user.

Optionally, the determining, according to the first gaze location, a first region corresponding to the first gaze location in the image to be displayed and a second region other than the first region in the image to be displayed includes:

and dividing the image to be displayed into a plurality of rectangular areas, and taking an area including the first gaze position as the first area and an area not including the first gaze position as the second area.

and taking a circular area with a certain radius and a certain size in the image to be displayed by taking the first gaze location as a center of a circle as the first area, and taking an area outside the first area as the second area.

Optionally, the rendering the first region and the second region in the image to be displayed respectively to obtain a first rendered image and a second rendered image includes:

rendering the first area in the image to be displayed according to the original image quality to obtain a first rendered image;

and downsampling the second area in the image to be displayed or the image to be displayed, and rendering the downsampled second area in the image to be displayed or the image to be displayed to obtain a second rendered image.

Optionally, the merging the first rendered image and the second rendered image to obtain a first target image includes:

feathering an edge of the first rendered image;

and superposing the first rendering image and the second rendering image which are subjected to the feathering processing to obtain the first target image.

Optionally, the method further comprises:

determining a motion parameter of the first gaze location according to a plurality of continuous first gaze locations of the target user on a display interface before the current time;

determining a second gaze position of the target user at the current time according to the motion parameters of the first gaze position and a plurality of consecutive first gaze positions before the current time;

when the first watching position of the target user at the current moment is not determined, determining a third area corresponding to the second watching position in the image to be displayed and a fourth area except the third area in the image to be displayed according to the second watching position;

rendering the third area and the fourth area in the image to be displayed respectively to obtain a third rendering image and a fourth rendering image, and combining the third rendering image and the fourth rendering image to obtain a second target image, wherein the image quality of the third rendering image is higher than that of the fourth rendering image.

Optionally, the eye image is acquired by a monocular camera.

An aspect of an embodiment of the present application further provides an image processing apparatus, including:

the eye image acquisition module is used for acquiring an eye image of a target user, wherein the eye image is a continuous multi-frame image;

the first fixation position determining module is used for determining a first fixation position of the target user on a display interface according to the eye image;

the area determining module is used for determining a first area corresponding to the first watching position in the image to be displayed and a second area except the first area in the image to be displayed according to the first watching position;

and the rendering module is used for respectively rendering the first area and the second area in the image to be displayed to obtain a first rendering image and a second rendering image, and combining the first rendering image and the second rendering image to obtain a first target image, wherein the image quality of the first rendering image is higher than that of the second rendering image.

An aspect of the embodiments of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the image processing method as described above when executing the computer program.

An aspect of the embodiments of the present application further provides a computer-readable storage medium, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the image processing method as described above when executing the computer program.

The image processing method, the image processing device, the computer equipment and the computer readable storage medium provided by the embodiment of the application have the following advantages:

by acquiring the eye image of the target user and determining the gaze position of the target user on the display interface according to the eye image, high-quality rendering is performed on the region corresponding to the gaze position in the image to be displayed, and low-quality rendering is performed on the images of the rest regions, so that the hardware performance requirement and pressure are reduced while the viewing experience of the user is ensured, and complicated eye movement tracking equipment is not needed, and the cost is low.

Drawings

Figure 1 schematically shows a human eye observation simulation;

fig. 2 is a schematic diagram schematically showing an application environment of the image processing method of the present application;

fig. 3 schematically shows a flow chart of an image processing method according to a first embodiment of the present application;

FIG. 4 is a flowchart illustrating sub-steps of step S300 in FIG. 3;

fig. 5 schematically shows an example of the multi-threshold binarization processing of the face image in step S402 in fig. 4;

FIG. 6 is a flowchart illustrating sub-steps of step S302 in FIG. 3;

FIG. 7 schematically illustrates an eye tracking data example;

FIG. 8 is a flowchart of the substeps of step S604 of FIG. 6;

FIG. 9 schematically illustrates preset index points of a display interface;

FIG. 10 is a flow chart schematically illustrating additional steps of an image processing method according to a first embodiment of the present application;

FIG. 11 is a flowchart illustrating additional substeps of step S302 of FIG. 3;

FIG. 12 is a flowchart illustrating sub-steps of step S306 in FIG. 3;

fig. 13 is a flowchart illustrating sub-steps of step S1206 of fig. 12;

FIG. 14 is a flow chart schematically illustrating additional steps of an image processing method according to a first embodiment of the present application;

fig. 15 schematically shows a block diagram of an image processing apparatus according to a second embodiment of the present application;

fig. 16 schematically shows a hardware architecture diagram of a computer device suitable for implementing an image processing method according to a third embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the descriptions relating to "first", "second", etc. in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.

In the description of the present application, it should be understood that the numerical references before the steps do not identify the order of performing the steps, but merely serve to facilitate the description of the present application and to distinguish each step, and therefore should not be construed as limiting the present application.

Figure 1 schematically shows a human eye observation simulation.

Although the human eye has a wide field of view, about 220 degrees in the horizontal direction and about 135 degrees in the vertical direction, the cone cells used to capture sharp visual details account for only about 6%, and the majority of the rods function to provide only blurred visual information. And rod cells can only provide color information in biased shades of gray. Referring to fig. 1, it can be seen that the image information that can be really well recognized by human eyes is only a small area (a dotted area in the figure) near the attention point of human eyes, and occupies only a small portion of the screen, which reflects that the viewing angle is only 1-2 degrees.

In view of this, for a high-resolution video, if the video can be appropriately downsampled, and a low-resolution rendering is performed on a portion (an area outside a point of attention of a human eye) with low accuracy requirement, pressure on performance of a machine can be relieved to a great extent, and the purposes of improving video smoothness, prolonging hardware life and the like while not reducing viewing experience of a user are achieved.

The presently preferred method of determining the point of attention of the human eye uses an eye tracker, which has a high degree of accuracy. However, this method has the disadvantage of requiring specialized eye tracking equipment, which is inconvenient and costly. Based on the above description, the present application provides a method for realizing human eye tracking based on an eye image acquired by a camera and further performing image processing based on a human eye attention point, which can reduce the requirement of image processing on the machine performance, relieve the pressure of the machine performance, and does not need complex and expensive professional equipment.

Fig. 2 schematically shows a schematic diagram of an application environment of the image processing method of the present application.

As shown in fig. 2, the computer device 20 includes a display screen 21 and a camera 22, as well as a processor (not shown) and other components. The computer device 20 may play a video on the display screen 21 and the user 23 may view the video played on the display screen 21.

The computer device 20 may be a variety of computers or mobile terminals such as a notebook, desktop, tablet, smartphone, or the like.

In order to alleviate the performance pressure on the computer device 20 caused by playing the high-resolution video in the present application, the computer device 20 may perform local high-precision rendering on the played video according to the gaze position (or the attention point) of the user 23 on the display screen 21. Specifically, a region corresponding to the user gaze position in the video image to be displayed is rendered with high precision, for example, with original high resolution, and a region outside the region corresponding to the user gaze position is down-sampled and rendered with reduced resolution, so that the user's viewing effect can be ensured because the image in the user gaze position region is still rendered with high precision, and the region outside the user gaze position is rendered with reduced resolution because the influence on the user's viewing effect is low, thereby alleviating the performance pressure of the rendered video on the computer device 20.

In the application, the computer device 20 may acquire a face image of the user 23 through the camera 22, further extract an eye image from the face image, and then determine a gaze position of the user on the display screen 21 according to the eye image of the user, so as to perform local high-precision rendering on the played video based on the gaze position. Therefore, the camera of the computer equipment can be used for tracking the human eyes, complex and expensive professional equipment is not needed, the cost is low, and the application range is wide.

A plurality of embodiments will be provided below, and the embodiments provided below can be used to implement the scheme of image processing described above. For ease of understanding, the following description will be exemplarily described with a computer device/server as the execution subject.

Example one

The embodiment provides an image processing method, which is applied to an electronic/computer device equipped with a camera, and specific technical details and effects can be referred to as follows.

Fig. 3 schematically shows a flowchart of an image processing method according to a first embodiment of the present application.

As shown in fig. 3, an image processing method according to a first embodiment of the present application may include:

step S300, acquiring an eye image of a target user, wherein the eye image is a continuous multi-frame image;

step S302, determining a first fixation position of the target user on a display interface according to the eye image;

step S304, determining a first area corresponding to the first gaze position in an image to be displayed and a second area except the first area in the image to be displayed according to the first gaze position;

step S306, respectively rendering the first region and the second region in the image to be displayed to obtain a first rendered image and a second rendered image, and combining the first rendered image and the second rendered image to obtain a first target image, wherein the quality of the first rendered image is higher than that of the second rendered image.

In step S300, the target user is a user watching a video played by the electronic device/computer device, a camera of the electronic device/computer device collects an image of the target user according to a certain frame rate, and an eye image of the target user is obtained from the collected image. The eyeball state of the target user can then be determined from the eye image of the target user.

In the present application, the eye image is a plurality of frames of eye images which are continuously acquired. That is, in the present application, the camera may continuously acquire images of a target user, and an eye image corresponding to a current time is referred to as an eye image of a current frame in this document.

In step S302, after the eye image of the target user is obtained, a first gaze location of the target user on the display interface may be determined according to the eye image of the target user.

As an example, the pre-trained gaze location prediction model may be utilized to process the eye image of the target user to obtain a first gaze location of the target user on the display interface.

As another example, a movement angle of the eyeball of the target user relative to a reference may be obtained according to the eye image of the target user, such as a state where the reference is that the eyeball of the user is looking straight ahead on a display screen of the electronic device. Then, the current space coordinate of the exit pupil is confirmed according to the movement angle of the eyeball. For example, by acquiring a movement angle of the user's eyeball relative to a reference, a rotation angle and a direction of the pupil can be determined, the center (or an origin) of the user's eyeball is taken as a center, and a connection line between the pupil and the center has a known length and a known trend, and spatial coordinates of the user's pupil can be acquired in the space. And finally, mapping the current space coordinate of the pupil to a two-dimensional coordinate system of a display interface, and marking the falling point of the pupil on the two-dimensional coordinate system as the fixation position of the eyeball.

In this application, a display interface refers to an interface for displaying a video on a display screen of a computer/electronic device, which may be an entire area or a partial area of the display screen.

In step S304, a first region corresponding to the first gaze location in the image to be displayed and a second region except the first region in the image to be displayed are determined according to the first gaze location.

As one example, the image to be displayed is divided into a plurality of rectangular regions, and a region including the first gaze position is taken as the first region, and a region not including the first gaze position is taken as the second region. The region including the first gaze location is taken to mean that the first gaze location is located within the rectangular region or on a border of the rectangular region. When the first gaze location is located on the border of the rectangular region, the rectangular region including the border where the first gaze location is located is all considered as the first region.

As another example, a circular area with a certain radius of size at the center of the first gaze location in the image to be displayed is taken as the first area, and an area other than the first area is taken as the second area. Namely, a circular area is determined as the first area by taking the first gaze position as the center, and the size of the circular area depends on the distance between the target user and the display interface. Specifically, the distance between the target user and the display interface can be determined according to the eyeball movement distance when the target user views different preset calibration points on the display interface. As described below, if the user moves the eyeball a greater distance (i.e., needs to move the eyeball a greater distance) when viewing different calibration points, the user is closer to the display interface, and thus the visual range on the display interface is smaller, a smaller radius may be selected (i.e., a smaller circular area is selected as the first area); conversely, if the user moves a small distance of the eyeball (i.e., does not need to move the eyeball widely) when viewing different calibration points, the user is far away from the display interface, and therefore the visual range on the display interface is large, and a large radius needs to be selected (i.e., a large circular area is selected as the first area).

In the step S306, after determining a first region corresponding to the first gaze location in the image to be displayed and a second region except the first region according to the first gaze location of the target user on the display interface, rendering the first region and the second region in the image to be displayed respectively to obtain a first rendered image and a second rendered image, and combining the first rendered image and the second rendered image to obtain a first target image, wherein the quality of the first rendered image is higher than that of the second rendered image. In other words, local high-precision rendering can be performed on the image to be displayed according to the first gaze position, so that the image quality of the region corresponding to the first gaze position is guaranteed, and meanwhile, the image quality of other regions is reduced, and therefore the performance pressure is relieved while the watching experience of a user is guaranteed.

As an example, the image to be displayed of the current frame may be downsampled according to the first gaze position of the target user on the display interface, for example, the original resolution is maintained for the first area corresponding to the first gaze position in the image to be displayed, and downsampling is performed for the second area (the area except the area corresponding to the first gaze position in the image to be displayed), so as to reduce the resolution of the images of the areas, thereby reducing the performance required for rendering and relieving the performance pressure of the device.

As another example, the image quality of the image to be displayed in the current frame may be enhanced according to the first gaze location of the target user on the display interface, for example, a first region of the image to be displayed corresponding to the first gaze location is enhanced (e.g., upsampling is performed to increase the resolution of the first region), and a second region (other regions of the image to be displayed except for the region corresponding to the first gaze location) is maintained at the original image quality, that is, the image quality is enhanced only for a part of the region, so that the performance required for rendering may be reduced, and the pressure on the performance of the device may be relieved.

It should be understood that, in the present application, the quality of the first rendered image is higher than that of the second rendered image, which may be that the resolution of the first rendered image is higher than that of the second rendered image, for example, the first rendered image is 4K, the second rendered image is 1080P, or that the image parameters of the second rendered image relative to the first rendered image, which affect the image quality or rendering pressure, are reduced.

The image processing method of the embodiment of the application has the following advantages:

Some alternative embodiments are provided below.

In an exemplary embodiment, a face image of a target user may be acquired through a camera, and then an eye image of the target user is determined according to the face image, as shown in fig. 4, step S300 may include: step S400, acquiring a face image of the target user; step S402, using at least two different thresholds to carry out binarization processing on the face image so as to obtain at least two binarized face images; step S404, determining the eye image of the target user according to at least two binarization face images.

In step S400, an image of the target user may be captured by the camera, and then a face image may be extracted from the captured image. The extraction of the face image can be performed by using a known face segmentation method or a face detection method, or by using a pre-trained face extraction model.

In the step S402, after the face image of the human target user is obtained, at least two different thresholds are used to perform binarization processing on the face image to obtain at least two binarized face images. As an example, the face image may be binarized using three different threshold values, resulting in three binarized images.

In step S404, the eye image of the target user is determined according to at least two of the binarized face images. Specifically, the eye image of the target user can be determined from the binarized face image by the following steps. Firstly, determining a human eye pupil region according to circular regions in at least two binarized face images, namely searching the circular regions in each binarized face image, then comparing the positions of the circular regions in each binarized face image, and if the positions of the circular regions in the binarized face images are the same, considering the circular regions as the human eye pupil regions. The eye image is then determined from the human eye pupil region. For example, the pupil area of human eyes is expanded outwards to a certain extent in the face image of the target user to be regarded as the eye area, and then the image of the eye area is intercepted from the face image to be used as the eye image.

The principle of determining the eye image by the method is as follows: because most of human eyes (Asians) are black pupils, the human eyes are black under a plurality of binarization threshold values, and the human eyes are all circular, so that the human eye pupil region can be found only by searching for circular regions of all binarization images, then the human eye region can be found by properly amplifying the region, and then the image of the region is intercepted as the eye image.

The method is described by taking asian people as an example, but the eye images can be determined from the face images by performing appropriate adjustment on people in other areas or in other ways. For example, eye images are extracted from face images by a pre-trained eye detection model.

By the scheme, the camera of the electronic device/computer device is used for collecting the face image, and then the eye image of the target user can be obtained by processing the face image, so that the method is simple and convenient, the cost is low, and no additional device is required.

Fig. 5 shows an example of performing multi-threshold binarization processing on a face image.

As shown in fig. 5, the top left image in the figure is the collected face image of the target user, in this example, three binarized face images are obtained by binarizing the face image with three different thresholds, and it can be seen from the three binarized face images that only the pupil area of the human eye is a black circular area, so that the pupil area of the human eye can be located by searching the circular area in the three binarized face images, and the eye area can be obtained by performing appropriate expansion after locating the pupil area of the human eye, and then the eye image can be obtained.

In an exemplary embodiment, the pre-trained gaze location prediction model may be utilized to process the eye image to obtain the first gaze location of the target user on the display interface, as shown in fig. 6, and step S302 may include: step S600, processing the eye image of the current frame by using a pre-trained gaze position prediction model to obtain the eye pupil movement deviation of the target user of the current frame; step S602, determining the initial predicted gaze position of the target user of the current frame according to the initial predicted gaze position of the target user of the previous frame and the human eye pupil movement offset of the target user of the current frame; step 604, determining the first gaze position of the target user in the current frame according to the initial predicted gaze position of the target user in the current frame and the predetermined eyeball states of the target user viewing a plurality of preset calibration points on the display interface.

In step S600, the eye image of the current frame is processed by using the pre-trained gaze location prediction model to obtain the eye pupil movement offset of the target user of the current frame.

As an example, a pre-trained gaze location prediction model may use a Transformer network (proposed by Google, inc.) because the gaze location of the human eye is not solely related to the current time, but continues to move based on the location of the previous time. The pre-trained gaze location prediction model may predict the gaze location of the human eye of the current frame by using the eye images of the consecutive frames as input, that is, the eye image of the target user acquired in step S300 is continuously input into the pre-trained gaze location prediction model, and then the gaze location of the human eye of the current frame is obtained by processing based on the plurality of sets of eye images continuously input. It should be appreciated that for the pre-trained gaze location prediction model, the input is an eye image and the output is a three-dimensional offset of the gaze location for the human eye. The reason why the offset amount takes a three-dimensional form is that the head of the target user moves so that the offset of the actual gaze position is a three-dimensional motion. The network cannot directly output the actual gaze position, because the difference of the shooting distance of the cameras and the distance between the eyes of the user and the screen can cause the difference of the eye movement amplitude of the user. The gaze location prediction model is therefore only able to output a movement shift of the pupil of the human eye.

The gaze location prediction model may be trained using common training methods and open source data sets. Illustratively, in the present application, the gaze location prediction model is trained using eye tracking data of the TEyeD [2] database as shown in FIG. 7.

In step S602, after obtaining the eye pupil movement shift of the target user in the current frame, the initial predicted gaze location of the target user in the current frame is obtained by combining the initial predicted gaze location of the target user in the previous frame. Namely, the initial predicted gaze position of the target user in the previous frame is adjusted according to the human eye pupil movement offset to obtain the initial predicted gaze position of the target user in the current frame. The initial predicted gaze position is a predicted gaze position determined based on through hole motion offset output by the gaze position prediction model, which is a three-dimensional position, and the three-dimensional coordinate system may be established with the center of the display interface as the origin.

In step S604, after determining the initial predicted gaze location of the target user in the current frame, the initial predicted gaze location needs to be transformed into the coordinate system of the display interface, so as to determine the first gaze location of the target user in the display interface in the current frame.

Specifically, the first gaze position of the target user in the current frame may be determined according to a predetermined eyeball state of the target user at a plurality of preset calibration points on the display interface, and an initial predicted gaze position of the target user in the current frame.

In an exemplary embodiment, as shown in fig. 8, step S604 may include: step S800, determining the eyeball state of the current frame of the target user according to the eye image of the current frame of the target user; step S802, determining at least three preset calibration points adjacent to the initial predicted fixation position according to the eyeball state of the current frame of the target user and the predetermined eyeball states of a plurality of preset calibration points of the target user on the display interface; step S804, determining the first gaze location according to the initial predicted gaze location and at least three of the preset calibration points adjacent to the initial predicted gaze location.

In step S800, an eyeball state of the current frame of the target user is determined according to the eye image of the current frame of the target user, where the eyeball state may be represented by an offset of a pupil from an eyeball center.

In step S802, the eyeball state of the current frame of the target user is compared with predetermined eyeball states of a plurality of preset calibration points of the target user viewing the display interface, at least three eyeball states close to the eyeball state of the current frame of the target user are selected from the eyeball states of the plurality of preset calibration points, and then the preset calibration points corresponding to the at least three eyeball states are used as the at least three preset calibration points adjacent to the initial predicted fixation position.

As an example, three eyeball states close to the eyeball state of the current frame of the target user are selected from the eyeball states of a plurality of preset calibration points, and then the preset calibration points corresponding to the three eyeball states are used as the three preset calibration points adjacent to the initial predicted fixation position.

In the step S804, the first gaze location is determined according to the initial predicted gaze location and at least three of the preset calibration points adjacent to the initial predicted gaze location. Namely, the initial predicted gaze position is located on a spatial plane where a display interface is located according to at least three preset calibration points adjacent to the initial predicted gaze position, so that a first gaze position is obtained.

Illustratively, the initial predicted gaze location of the target user for a current frame is adjusted, and the adjusted initial predicted gaze location is taken as the first gaze location when the sum of the distances of the adjusted initial predicted gaze location and the at least three preset calibration points reaches a minimum. In other words, after obtaining the initial predicted gaze location, the first gaze location is obtained by calculating the distances of the location from at least three nearest predetermined calibration points and suitably modifying the location such that the sum of the at least three distances is minimal.

As an example, the first gaze location P may be determined by the following formula:

min((dis(P-P₁)-L₁)+(dis(P-P₂)-L₂)+(dis(P-P₃)-L₃)+dis(P-P₀))

the P0 is the initial predicted gaze location of the target user in the current frame, and may be regarded as a predicted location output by the gaze location prediction model, and the P1, the P2, and the P3 are three neighboring calibration points from among a plurality of preset calibration points and the initial predicted gaze location P0. L1, L2, L3 are the distances between P0 and P1, P2, P3. The most approximate human eye gaze location may be obtained as the first gaze location by solving the above formula/equation.

Referring to fig. 9, fig. 9 schematically illustrates a preset calibration point of a display interface. In the present application, 5 calibration points (circle areas in fig. 9) are preset on the display interface, and are respectively located at the center and four corner positions of the display interface.

In an exemplary embodiment, when the target user uses the electronic device configured with the image processing method of the present application, the target user may perform calibration in advance using preset calibration points, so as to determine the distance between the target user and the display interface and view the eyeball state of each preset calibration point, so as to implement the method steps related to the calibration points, as shown in fig. 10, the method further includes: step S1000, obtaining an eyeball image of the target user when the preset calibration point on the display interface is taken as a fixation position; step S1002, determining an eyeball state of the target user when viewing the preset calibration point on the display interface according to an eyeball image of the target user when the preset calibration point on the display interface is taken as a gazing position.

In the step S1000, the eyeball image of the target user when the preset calibration point on the display interface is the gazing position may be obtained by first acquiring a face image of the target user when the target user watches each preset calibration point, and then obtaining the eyeball image of the target user when the preset calibration point on the display interface is the gazing position based on the face image in the above manner.

In the step S1002, an eyeball state of the target user when viewing the preset calibration point on the display interface is determined according to the eyeball image of the target user when the preset calibration point on the display interface is taken as the gaze position, where the eyeball state may be represented by an offset of a pupil relative to an eyeball center.

Further, in the exemplary embodiment, the eyeball-movement distance of the target user viewing each preset calibration point may also be acquired in the above step S1002. Taking fig. 9 as an example of the preset calibration point, the eyeball movement distance when the target user views the calibration point at the center can be regarded as 0, and the eyeball movement distance when viewing the other 4 calibration points can be obtained on the basis of this. If the moving distance of the eyeballs is large (namely, the eyeballs need to be moved in a large range) when the user watches different calibration points, the user is close to the display interface, and therefore the visual range on the display interface is small; conversely, if the moving distance of the eyeball is small (i.e. the eyeball does not need to be moved widely) when the user views different calibration points, the user is far away from the display interface, and therefore the visual range on the display interface is large.

In an exemplary embodiment, to facilitate determining the first gaze location of the target user on the display interface, as shown in fig. 11, step S302 further includes: step S1100, determining an initial predicted gaze location of a first frame of the target user according to a predetermined eyeball state of the target user viewing a plurality of preset calibration points on the display interface and the eyeball image of the first frame of the target user.

That is, after the eyeball image of the first frame of the target user is acquired, because only one frame of eye image exists at this time, the pupil offset cannot be output through the preset gaze position prediction model, and at this time, the initial predicted gaze position of the first frame of the user needs to be determined according to the eyeball states of the plurality of preset calibration points and the eyeball state of the eye image of the first frame.

The initial predicted fixation position of the first frame may be determined by first finding three eyeball states similar to the eyeball state of the first frame from the eyeball states of the plurality of preset fixation points, and then determining the initial predicted fixation position of the first frame according to the three preset fixation points corresponding to the three eyeball states. For example by determining a point having the smallest sum of the distances from the three predetermined index points as the initial predicted gaze location for the first frame.

After the initial predicted fixation position of the first frame is determined, for each frame of eye image, the corresponding initial predicted fixation position can be determined by adding the initial predicted fixation position of the previous frame and the pupil movement offset of the current frame.

In an exemplary embodiment, as shown in fig. 12, step S306 may include: step S1200, rendering the first area in the image to be displayed with original image quality to obtain a first rendered image; step S1202, down-sampling the second area in the image to be displayed or the image to be displayed, and rendering the down-sampled second area in the image to be displayed or the image to be displayed to obtain a second rendered image.

In the step S1200, a first rendered image is obtained by rendering the first area in the image to be displayed with the original image quality. The image pieces in the first area are rendered, for example, at the original resolution of the image to be displayed, resulting in a first rendered image. For example, if the original resolution of the image to be displayed is 4K, the image pieces in the first area are rendered at 4K resolution to obtain a first rendered image.

In the step S1202, down-sampling is performed on the second area in the image to be displayed or the image to be displayed, and the down-sampled second area in the image to be displayed or the image to be displayed is rendered to obtain a second rendered image.

As an example, down-sampling the second area in the image to be displayed, for example, reducing the resolution from 4K to 1080P, and then rendering the down-sampled second area in the image to be displayed to obtain a second rendered image.

As another example, the image to be displayed is directly downsampled, for example, the resolution is reduced from 4K to 1080P, and the downsampled image to be displayed is rendered to obtain a second rendered image.

According to the scheme, the high-precision rendering is carried out on the first area corresponding to the first watching position to ensure the watching experience of a user, the down-sampling low-resolution rendering is carried out on the second area outside the first area to reduce the performance pressure of equipment, and the image rendering efficiency is improved.

As an example, a rectangular division is taken as an example, for example, an image to be displayed is uniformly divided into nine regions of 3 × 3, and when a gaze position of a human eye is within a certain region, the region is rendered with high precision, and when the gaze point is not inside the region, the region is rendered with low precision. This approach can be approximately 70% more efficient. For example, for a 4K image down-sampled to 1080p, the scheme only needs to render

One pixel, and a 4K image is shared8847360 pixels, so only 34.5% of the pixels need to be rendered.

As another example, taking a circle division as an example, downsampling is performed by superimposing a small circular high-precision region on a low-resolution image. The circle center of the circular area is the position of the gazing point of the human eye, and the area needs to change the position at any time along with the gazing point. Compared with fixed division, the scheme can render a smaller picture area, and can further improve the performance.

In an exemplary embodiment, in order to allow the first target image to be positioned between the first rendering image and the second rendering image as shown in fig. 13, step S1206 may include: step S1300, feathering the edge of the first rendering image; step S1302, superimposing the first rendered image and the second rendered image after feathering to obtain the first target image.

Through the scheme, the feathering effect and the lower down-sampling image are superposed at the edge of the upper high-precision image, and the two images are smoothly mixed through fusion, so that the stronger tearing feeling at the joint of the high-precision image and the low-precision image is prevented.

Because the gaze position of human eyes needs to be calculated to a certain extent, the gaze position cannot be completely real-time, and a certain degree of delay still exists, so that in an exemplary embodiment, the gaze position needs to be predicted to a certain extent, thereby preventing an actual high-precision rendering area from being unable to keep up with the change of the gaze position of a user and influencing the viewing experience of the user. As shown in fig. 14, the method further includes: step S1400, determining the motion parameters of the first gaze position according to a plurality of continuous first gaze positions of the target user on a display interface before the current moment; step S1402, determining a second fixation position of the target user at the current time according to the motion parameters of the first fixation position and a plurality of continuous first fixation positions before the current time; step S1404, when the first gaze location of the target user at the current time is not determined, determining a third region corresponding to the second gaze location in the image to be displayed and a fourth region except the third region in the image to be displayed according to the second gaze location; step S1406, respectively rendering the third area and the fourth area in the image to be displayed to obtain a third rendered image and a fourth rendered image, and combining the third rendered image and the fourth rendered image to obtain a second target image, where the quality of the third rendered image is higher than that of the fourth rendered image.

In step S1400, a plurality of consecutive first gaze locations of the target user on the display interface before the current time are determined according to a plurality of consecutive frames of eye images before the current time, where each frame of eye image corresponds to one first gaze location, and according to the plurality of consecutive first gaze locations, motion parameters of the first gaze location, such as a motion speed and an acceleration of the first location, may be determined.

In step S1402, after the motion parameter of the first position is determined, the second gaze position at the current time can be obtained by performing correction based on a plurality of consecutive first gaze positions according to the motion parameter.

Illustratively, taking three consecutive first gaze locations as an example, the second gaze location at the current time may be determined by the following formula:

wherein P is the second fixation position at the current time, P₀，P₁，P₂Three first fixation positions (corresponding to three continuous frames of eye images) before the current time are continuously displayed, and t is unit time and is 1.

In the foregoing steps S1404 to S1406, if the first gaze location of the target user on the display interface at the current time is not calculated yet, the image to be displayed is rendered according to the second gaze location at the current time, which is inferred according to the multiple consecutive first gaze locations before the current time in the steps S1400 to S1402, and the rendering manner is similar to that when rendering is performed according to the first gaze location, and is not described herein again.

In an exemplary embodiment, the eye image is acquired by a monocular camera. The eye eyeball tracking of the human eyes is achieved through the monocular camera, so that local high-precision rendering is conducted on the display image according to the eye watching position of the human eyes, the watching experience of a user is guaranteed, the performance pressure of the equipment is reduced, the cost is reduced, and complicated and expensive eye movement tracking equipment is not needed.

Example two

Fig. 15 schematically shows a block diagram of an image processing apparatus according to the second embodiment of the present application, which may be divided into one or more program modules, the one or more program modules being stored in a storage medium and executed by one or more processors to complete the embodiments of the present application. The program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments that can perform specific functions, and the following description will specifically describe the functions of each program module in the embodiments.

As shown in fig. 15, the image processing apparatus 1500 may include an eye image acquisition module 1510, a first gaze location determination module 1520, a region determination module 1530, and a rendering module 1540.

The eye image obtaining module 1510 is configured to obtain an eye image of a target user through a camera, where the eye image is a continuous multi-frame image.

A first gaze location determination module 1520, configured to determine a first gaze location of the target user on a display interface according to the eye image.

A region determining module 1530, configured to determine, according to the first gaze location, a first region corresponding to the first gaze location in the image to be displayed, and a second region except the first region in the image to be displayed,

A rendering module 1540, configured to respectively render the first region and the second region in the image to be displayed to obtain a first rendered image and a second rendered image, and combine the first rendered image and the second rendered image to obtain a first target image, where the quality of the first rendered image is higher than that of the second rendered image.

In an exemplary embodiment, the eye image acquisition module 1510 is configured to: acquiring a face image of the target user; carrying out binarization processing on the face image by using at least two different thresholds to obtain at least two binarized face images; and determining the eye image of the target user according to the at least two binarization face images.

In an exemplary embodiment, the eye image acquisition module 1510 is further configured to: determining a human eye pupil region according to the circular regions in at least two binarization human face images; determining the eye image according to the human eye pupil region.

In the exemplary embodiment, first gaze location determination module 1520 is configured to: processing the eye image of the current frame by using a pre-trained gaze position prediction model to obtain the eye pupil movement offset of the target user of the current frame; determining the initial predicted gaze position of the target user of the current frame according to the initial predicted gaze position of the target user of the previous frame and the human eye pupil movement offset of the target user of the current frame; determining the first gaze position of the target user of the current frame according to the initial predicted gaze position of the target user of the current frame and the predetermined eyeball states of the target user when watching a plurality of preset calibration points on the display interface

In the exemplary embodiment, first gaze location determination module 1520 is further configured to: determining the eyeball state of the current frame of the target user according to the eye image of the current frame of the target user; determining at least three preset calibration points adjacent to the initial predicted fixation position according to the eyeball state of the current frame of the target user and the predetermined eyeball states of a plurality of preset calibration points of the target user on the display interface; and determining the first gaze location according to the initial predicted gaze location and at least three of the preset index points adjacent to the initial predicted gaze location.

In the exemplary embodiment, first gaze location determination module 1520 is also configured to: and adjusting the initial predicted gaze position of the target user of the current frame, and taking the adjusted initial predicted gaze position as the first gaze position when the sum of the distances between the adjusted initial predicted gaze position and the at least three preset calibration points reaches a minimum.

In an exemplary embodiment, the eye image acquisition module 1510 is further configured to: acquiring an eyeball image of the target user when the preset calibration point on the display interface is taken as a fixation position; and determining the eyeball state of the target user when the target user watches the preset calibration point on the display interface according to the eyeball image of the target user when the preset calibration point on the display interface is taken as the gazing position.

In the exemplary embodiment, first gaze location determination module 1520 is further configured to: and determining an initial predicted fixation position of the first frame of the target user according to the predetermined eyeball states of the target user viewing a plurality of preset calibration points on the display interface and the eyeball image of the first frame of the target user.

In the exemplary embodiment, the region determination module 1530 is configured to: and dividing the image to be displayed into a plurality of rectangular areas, and taking an area including the first gaze position as the first area and an area not including the first gaze position as the second area.

In the exemplary embodiment, region determination module 1530 is configured to: and taking a circular area with a certain radius and a certain size in the image to be displayed by taking the first gaze location as a center of a circle as the first area, and taking an area outside the first area as the second area.

In an exemplary embodiment, the rendering module 1540 is configured to: and downsampling the second area in the image to be displayed or the image to be displayed, and rendering the downsampled second area in the image to be displayed or the image to be displayed to obtain a second rendered image.

In an exemplary embodiment, the rendering module 1540 is configured to: feathering an edge of the first rendered image; and superposing the first rendering image and the second rendering image which are subjected to the feathering processing to obtain the first target image.

In an exemplary embodiment, the apparatus further comprises a second gaze location determination module (not shown) for: determining a motion parameter of the first gaze location according to a plurality of continuous first gaze locations of the target user on a display interface before the current time; determining a second gaze position of the target user at the current time according to the motion parameters of the first gaze position and a plurality of consecutive first gaze positions before the current time; the region determination module 1530 is further configured to: when the first watching position of the target user at the current moment is not determined, determining a third area corresponding to the second watching position in the image to be displayed and a fourth area except the third area in the image to be displayed according to the second watching position; rendering module 1540 is further configured to: rendering the third area and the fourth area in the image to be displayed respectively to obtain a third rendering image and a fourth rendering image, and combining the third rendering image and the fourth rendering image to obtain a second target image, wherein the image quality of the third rendering image is higher than that of the fourth rendering image.

In an exemplary embodiment, the camera is a monocular camera.

EXAMPLE III

Fig. 16 schematically shows a hardware architecture diagram of a computer device 10000 suitable for an image processing method according to a third embodiment of the present application. The computer device 10000 may be a live server or a live terminal, or may be a part of a live server or a live terminal. The computer device 10000 can be a device capable of automatically performing numerical calculation and/or data processing according to instructions set in advance or stored. For example, the server may be a rack server, a blade server, a tower server or a rack server (including an independent server or a server cluster composed of a plurality of servers), a gateway, and the like. As shown in fig. 13, computer device 10000 includes at least, but is not limited to: the memory 10010, processor 10020, and network interface 10030 may be communicatively linked to each other via a system bus. Wherein:

the memory 10010 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the storage 10010 may be an internal storage module of the computer device 10000, such as a hard disk or a memory of the computer device 10000. In other embodiments, the memory 10010 may also be an external storage device of the computer device 10000, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the computer device 10000. Of course, the memory 10010 may also include both internal and external memory modules of the computer device 10000. In this embodiment, the memory 10010 is generally used for storing an operating system installed in the computer device 10000 and various types of application software, such as program codes of an image processing method. In addition, the memory 10010 may also be used to temporarily store various types of data that have been output or are to be output.

Processor 10020 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip in some embodiments. The processor 10020 is generally configured to control overall operations of the computer device 10000, such as performing control and processing related to data interaction or communication with the computer device 10000. In this embodiment, the processor 10020 is configured to execute the program code stored in the memory 10010 or process data.

Network interface 10030 may comprise a wireless network interface or a wired network interface, and network interface 10030 is generally used to establish a communication link between computer device 10000 and other computer devices. For example, the network interface 10030 is used to connect the computer device 10000 to an external terminal through a network, establish a data transmission channel and a communication link between the computer device 10000 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), or Wi-Fi.

It should be noted that fig. 16 only shows a computer device having the components 10010-10030, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

In this embodiment, the image processing method stored in the memory 10010 can be further divided into one or more program modules and executed by one or more processors (in this embodiment, the processor 10020) to complete the embodiment of the present application.

Example four

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the image processing method in the embodiments.

In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device. Of course, the computer-readable storage medium may also include both internal and external storage devices of the computer device. In the present embodiment, the computer-readable storage medium is generally used for storing an operating system and various types of application software installed in the computer device, for example, the program codes of the image processing method in the embodiment, and the like. Further, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all the equivalent structures or equivalent processes that can be directly or indirectly applied to other related technical fields by using the contents of the specification and the drawings of the present application are also included in the scope of the present application.

Claims

1. An image processing method, characterized in that the method comprises:

2. The image processing method according to claim 1, wherein the acquiring an eye image of a target user comprises:

acquiring a face image of the target user;

using at least two different thresholds to carry out binarization processing on the face image so as to obtain at least two binarization face images;

3. The image processing method according to claim 2, wherein said determining the eye image of the target user from at least two of the binarized face images comprises:

determining the eye image according to the human eye pupil region.

4. The image processing method according to any one of claims 1 to 3, wherein the determining a first gaze location of the target user on a display interface from the eye image comprises:

5. The image processing method according to claim 4, wherein the determining the first gaze location of the target user of the current frame according to the initial predicted gaze location of the target user of the current frame and predetermined eyeball states of the target user at a plurality of preset calibration points on the viewing display interface comprises:

6. The image processing method according to claim 5, wherein said determining the first gaze location from the initial predicted gaze location and at least three of the preset calibration points adjacent to the initial predicted gaze location comprises:

7. The image processing method according to claim 4, wherein before acquiring the eye image of the target user, the method further comprises:

8. The image processing method according to claim 4, wherein the determining a first gaze location of the target user on a display interface from the eye image further comprises:

9. The image processing method according to any one of claims 1 to 8, wherein the determining a first region in the image to be displayed corresponding to the first gaze location and a second region in the image to be displayed other than the first region according to the first gaze location comprises:

10. The image processing method according to any one of claims 1 to 8, wherein the determining a first region in the image to be displayed corresponding to the first gaze location and a second region in the image to be displayed other than the first region according to the first gaze location comprises:

11. The image processing method according to any one of claims 1 to 8, wherein the rendering the first area and the second area in the image to be displayed to obtain a first rendered image and a second rendered image respectively comprises:

12. The image processing method according to any one of claims 1 to 8, wherein said merging the first rendered image and the second rendered image to obtain a first target image comprises:

feathering an edge of the first rendered image;

13. The image processing method according to any one of claims 1 to 12, characterized in that the method further comprises:

rendering the third area and the fourth area in the image to be displayed respectively to obtain a third rendered image and a fourth rendered image, and combining the third rendered image and the fourth rendered image to obtain a second target image, wherein the quality of the third rendered image is higher than that of the fourth rendered image.

14. The image processing method according to any one of claims 1 to 13, wherein the eye image is acquired by a monocular camera.

15. An image processing apparatus characterized by comprising:

and the rendering module is used for respectively rendering the first area and the second area in the image to be displayed to obtain a first rendering image and a second rendering image, and combining the first rendering image and the second rendering image to obtain a first target image, wherein the quality of the first rendering image is higher than that of the second rendering image.

16. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, is adapted to carry out the steps of the image processing method according to any of claims 1 to 14.

17. A computer-readable storage medium, having stored thereon a computer program, the computer program being executable by at least one processor to cause the at least one processor to perform the steps of the image processing method of any one of claims 1 to 14.