WO2022261856A1

WO2022261856A1 - Image processing method and apparatus, and storage medium

Info

Publication number: WO2022261856A1
Application number: PCT/CN2021/100351
Authority: WO
Inventors: 代具亭; 皮志明
Original assignee: 华为技术有限公司
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2022-12-22
Also published as: CN115707355A

Abstract

The present application relates to an image processing method and apparatus, and a storage medium. The method comprises: detecting an image to be processed, which is collected by an image collection component, and determining a face region and an eye region of a target object in said image; performing gaze detection on the face region and the eye region, so as to determine a fixation point of the target object; according to the fixation point, determining a gaze angle of the target object; and according to the gaze angle, adjusting the eye region to obtain a target image. By means of the embodiments of the present application, the gaze of an eye in an image can be automatically detected and adjusted, such that the adjusted gaze of the eye in a target image is kept straight, thereby improving the photographing effect and the user experience.

Description

Image processing method, device and storage medium

technical field

The present application relates to the technical field of image processing, and in particular to an image processing method, device and storage medium.

Background technique

When users take selfies or video calls through electronic devices such as mobile phones, tablet computers, and smart watches, they usually look at the screen of the electronic device instead of the lens. Because the human eye sight deviates from the lens, the human eye sight in the captured photos or videos is not correct, resulting in unattractive portrait eyes and poor user experience.

Contents of the invention

In view of this, an image processing method, device and storage medium are proposed.

In the first aspect, the embodiment of the present application provides an image processing method, the method includes: detecting the image to be processed collected by the image acquisition component, and determining the face area and human face area of the target object in the image to be processed eye area: performing line-of-sight detection on the human face area and the human eye area, and determining the gaze point of the target object, the gaze point being used to indicate the position of the target object's line of sight on a preset reference plane ; According to the gaze point, determine the line-of-sight angle of the target object, the line-of-sight angle is used to indicate the offset of the gaze point relative to the reference point on the image acquisition component; according to the line-of-sight angle, the The human eye area is adjusted to obtain the target image.

The embodiment of the present application can detect the image to be processed collected by the image acquisition component, determine the face area and eye area of the target object in the image to be processed, and perform sight line detection on the face area and eye area to obtain The gaze point of the target object, and then determine the line of sight angle of the target object according to the gaze point of the target object, and adjust the human eye area according to the line of sight angle point to obtain the target image, so that the gaze point of the target object can be detected according to the image content, Then determine the line of sight angle, and adjust the eye area of the target object based on the line of sight angle, which can not only improve the detection accuracy of the line of sight angle, but also realize the line of sight adjustment in any direction, so that the line of sight of the human eye in the target image can be maintained squarely, improving Shooting effect and user experience.

According to the first aspect, in the first possible implementation manner of the image processing method, the determining the line-of-sight angle of the target object according to the gaze point includes: determining the distance between the human eye of the target object and the A first distance between the gazing points; according to the gazing point, the reference point and the first distance, determine the line-of-sight angle of the target object.

In the embodiment of the present application, the angle of sight of the target object is determined through the triangular relationship determined by the point of gaze, the first distance (i.e., the distance between the human eyes) and the reference point, which is compared with the prior art (directly inputting the image of the human face region into the network regression model to obtain Compared with line-of-sight angle), it can not only greatly reduce the detection difficulty of line-of-sight angle, but also improve the detection accuracy of line-of-sight angle.

According to the first possible implementation of the first aspect, in the second possible implementation of the image processing method, according to the gaze point, the reference point and the first distance, the determined The line-of-sight angle of the target object includes: determining a second distance between the reference point and the gaze point; and determining the line-of-sight angle of the target object according to the first distance and the second distance.

In the embodiment of the present application, by determining the second distance between the reference point and the gaze point, and determining the line-of-sight angle of the target object according to the first distance and the second distance, it is simple and fast, and the processing efficiency can be improved.

According to the first aspect, the first possible implementation of the first aspect, or the second possible implementation of the first aspect, in a third possible implementation of the image processing method, the The line of sight angle is adjusted to the human eye area to obtain the target image, including: determining the line of sight adjustment angle according to the line of sight angle and the reference point; determining the line of sight transformation according to the line of sight adjustment angle and the human eye area relationship; according to the line-of-sight transformation relationship, the human eye area is adjusted to obtain the target image.

The embodiment of the present application can determine the line of sight adjustment angle according to the line of sight angle and the reference point, and determine the line of sight transformation relationship according to the line of sight adjustment angle and the human eye area, and then adjust the human eye area according to the line of sight transformation relationship to obtain the target image , so that the line-of-sight conversion relationship can be determined, and the line-of-sight conversion relationship can be directly applied to the human eye area in the image to be processed, so as to realize the line-of-sight adjustment of an image with any resolution.

According to any one of the first aspect or the first possible implementation manner of the first aspect to the third possible implementation manner of the first aspect, in a fourth possible implementation manner of the image processing method, The line of sight detection is realized through neural network detection.

In the embodiment of the present application, the neural network is used to detect the gaze of the face area and the eye area to obtain the gaze point of the target object, which can not only improve the processing efficiency, but also improve the accuracy of the gaze point of the target object.

According to the third possible implementation manner of the first aspect, in the fifth possible implementation manner of the image processing method, the determination of the line-of-sight transformation relationship is implemented through neural network processing.

In the embodiments of the present application, the neural network is used to determine the line-of-sight transformation relationship, which can not only improve processing efficiency, but also improve the accuracy of the line-of-sight transformation relationship.

According to the first aspect, in the sixth possible implementation of the image processing method, the detection of the image to be processed collected by the image acquisition component is performed, and the face area and the face area of the target object in the image to be processed are determined. The human eye area includes: performing face detection on the image to be processed collected by the image acquisition component to obtain the face area of the target object in the image to be processed; performing face key point detection on the face area to obtain the The human face key points of the target object; according to the human eye key points in the human face key points, determine the human eye area of the target object in the image to be processed.

In the embodiment of the present application, the face area and eye area of the target object are determined through face detection and face key point detection, which can improve processing efficiency.

According to the sixth possible implementation manner of the first aspect, in the seventh possible implementation manner of the image processing method, performing line-of-sight detection on the face area and the human eye area to determine the The gaze point of the target object includes: determining the head posture of the target object according to the key points of the human face; judging whether the head posture satisfies a preset condition, and the preset condition includes pitch in the head posture The angle is less than or equal to the preset pitch angle threshold and the roll angle is less than or equal to the preset roll angle threshold; when the head posture satisfies the preset condition, the human face area and the human eyes Line-of-sight detection is performed in the region to determine the gaze point of the target object.

In the embodiments of the present application, through head pose detection, it is possible to filter photos in the scene of a large-angle head pose of the target object when the human eye sight is adjusted, thereby reducing interference with the user's shooting intention and improving user experience.

According to the sixth possible implementation manner of the first aspect, in the eighth possible implementation manner of the image processing method, performing line-of-sight detection on the human face area and the human eye area to determine the The gaze point of the target object includes: judging whether the key points of the human eye in the key points of the human face are complete; in the case that the key points of the human eye are complete, perform line-of-sight on the human face area and the human eye area Detecting, determining a gaze point of the target object.

In the embodiment of the present application, by judging the key points of the human eyes, it is possible to perform line-of-sight detection on the face area and the human eye area of the target object under the condition that the eyes of the target object are not blocked, and then determine the gaze point, thereby improving the target object's eyesight. Gaze point accuracy.

According to the first aspect or any one of multiple possible implementation manners of the first aspect, in a ninth possible implementation manner of the image processing method, the reference plane includes a plane where the reference point is located.

In the embodiment of the present application, the plane where the reference point on the image acquisition component is located is used as the reference plane, which is more closely integrated with the actual application scene. According to the reference plane, the gaze point of the target object can be determined to improve the accuracy of the gaze point.

In a second aspect, an embodiment of the present application provides an image processing device, the device comprising: an image acquisition component, configured to acquire an image of a target object to obtain an image to be processed; a processing component configured to: Detecting the image to be processed, determining the face area and eye area of the target object in the image to be processed; performing line-of-sight detection on the face area and the eye area, and determining the gaze point of the target object, The gaze point is used to indicate the position of the gaze of the target object on the preset reference plane; according to the gaze point, the gaze angle of the target object is determined, and the gaze angle is used to indicate that the gaze point is relatively The offset of the reference point on the image acquisition component; according to the line of sight angle, the human eye area is adjusted to obtain a target image.

The embodiment of the present application can detect the image to be processed collected by the image acquisition component, determine the face area and eye area of the target object in the image to be processed, and perform sight line detection on the face area and eye area to obtain The gaze point of the target object, and then determine the line of sight angle of the target object according to the gaze point of the target object, and adjust the human eye area according to the line of sight angle to obtain the target image, so that the gaze point of the target object can be detected according to the image content, and then Determine the line of sight angle, and adjust the human eye area of the target object based on the line of sight angle, which can not only improve the detection accuracy of the line of sight angle, but also realize the adjustment of the line of sight in any direction, so that the human eye line of sight in the target image can maintain a square view and improve shooting effect and user experience.

According to the second aspect, in the first possible implementation manner of the image processing device, the determining the line-of-sight angle of the target object according to the gaze point includes: determining the distance between the human eye of the target object and the A first distance between the gazing points; according to the gazing point, the reference point and the first distance, determine the line-of-sight angle of the target object.

According to the first possible implementation manner of the second aspect, in the second possible implementation manner of the image processing device, the determination of the The line-of-sight angle of the target object includes: determining a second distance between the reference point and the gaze point; and determining the line-of-sight angle of the target object according to the first distance and the second distance.

In the embodiment of the present application, by determining the second distance between the reference point and the gaze point, and determining the line-of-sight angle of the target object according to the first distance and the second distance, it is simple and fast, and can improve processing efficiency.

According to the second aspect, the first possible implementation of the second aspect, or the second possible implementation of the second aspect, in a third possible implementation of the image processing device, the The line of sight angle is adjusted to the human eye area to obtain the target image, including: determining the line of sight adjustment angle according to the line of sight angle and the reference point; determining the line of sight transformation according to the line of sight adjustment angle and the human eye area relationship; according to the line-of-sight transformation relationship, the human eye area is adjusted to obtain the target image.

According to any one of the second aspect or the first possible implementation manner of the second aspect to the third possible implementation manner of the second aspect, in a fourth possible implementation manner of the image processing apparatus, The line of sight detection is realized through neural network detection.

According to a third possible implementation manner of the second aspect, in a fifth possible implementation manner of the image processing apparatus, the determination of the line-of-sight transformation relationship is implemented through neural network processing.

According to the second aspect, in the sixth possible implementation manner of the image processing device, the detection of the image to be processed is performed to determine the face area and eye area of the target object in the image to be processed , comprising: performing face detection on the image to be processed collected by the image acquisition component to obtain the face area of the target object in the image to be processed; performing face key point detection on the face area to obtain the target object The human face key points; according to the human eye key points in the human face key points, determine the human eye area of the target object in the image to be processed.

According to the sixth possible implementation manner of the second aspect, in the seventh possible implementation manner of the image processing device, performing line-of-sight detection on the face area and the human eye area to determine the The gaze point of the target object includes: determining the head posture of the target object according to the key points of the human face; judging whether the head posture satisfies a preset condition, and the preset condition includes pitch in the head posture The angle is less than or equal to the preset pitch angle threshold and the roll angle is less than or equal to the preset roll angle threshold; when the head posture satisfies the preset condition, the human face area and the human eyes Line-of-sight detection is performed in the region to determine the gaze point of the target object.

According to the sixth possible implementation manner of the second aspect, in the eighth possible implementation manner of the image processing device, performing line-of-sight detection on the face area and the eye area to determine the The gaze point of the target object includes: judging whether the key points of the human eye in the key points of the human face are complete; in the case that the key points of the human eye are complete, perform line-of-sight on the human face area and the human eye area Detecting, determining a gaze point of the target object.

According to the second aspect or any one of multiple possible implementation manners of the second aspect, in a ninth possible implementation manner of the image processing apparatus, the reference plane includes a plane where the reference point is located.

In a third aspect, an embodiment of the present application provides an image processing device, including: an image acquisition component, configured to acquire an image of a target object to obtain an image to be processed; a processor; a memory for storing instructions executable by the processor ; Wherein, the processor is configured to implement the image processing method of the first aspect or one or more of the multiple possible implementations of the first aspect when executing the instructions.

In the fourth aspect, the embodiments of the present application provide a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the above-mentioned first aspect or the first aspect can be realized One or several image processing methods in various possible implementations.

In the fifth aspect, the embodiments of the present application provide a computer program product, including computer readable code, or a non-volatile computer readable storage medium bearing computer readable code, when the computer readable code is stored in an electronic When running in the device, the processor in the electronic device executes the image processing method of the first aspect or one or more of the multiple possible implementations of the first aspect.

The embodiment of the present application can detect the image to be processed collected by the image acquisition component, determine the face area and eye area of the target object in the image to be processed, and perform sight line detection on the face area and eye area to obtain The gaze point of the target object, and then determine the line of sight angle of the target object according to the gaze point of the target object, and adjust the human eye area according to the line of sight angle to obtain the target image, so that the gaze point of the target object can be detected according to the image content, and then Determine the line of sight angle, and adjust the eye area of the target object based on the line of sight angle, which can not only improve the detection accuracy of the line of sight angle, but also realize the adjustment of the line of sight in any direction, so that the human eye line of sight in the target image can be maintained squarely, and the shooting can be improved effect and user experience.

These and other aspects of the present application will be made more apparent in the following description of the embodiment(s).

Description of drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate exemplary embodiments, features, and aspects of the application and, together with the specification, serve to explain the principles of the application.

Fig. 1 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.

Fig. 2 shows a block diagram of a software structure of an electronic device according to an embodiment of the present application.

Fig. 3 shows a flowchart of an image processing method according to an embodiment of the present application.

Fig. 4 shows a schematic diagram of viewing angles according to an embodiment of the present application.

Fig. 5 shows a schematic diagram of a process of determining a line-of-sight angle according to an embodiment of the present application.

Fig. 6 shows a flowchart of an image processing method according to an embodiment of the present application.

Fig. 7 shows a schematic diagram of a processing procedure of line of sight adjustment according to an embodiment of the present application.

Fig. 8 shows a block diagram of an image processing device according to an embodiment of the present application.

detailed description

Various exemplary embodiments, features, and aspects of the present application will be described in detail below with reference to the accompanying drawings. The same reference numbers in the figures indicate functionally identical or similar elements. While various aspects of the embodiments are shown in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or better than other embodiments.

In addition, in order to better illustrate the present application, numerous specific details are given in the following specific implementation manners. It will be understood by those skilled in the art that the present application may be practiced without certain of the specific details. In some instances, methods, means, components and circuits well known to those skilled in the art have not been described in detail in order to highlight the gist of the present application.

In related technologies, the sight line of human eyes is usually adjusted by generative adversarial networks (GAN). For example, the image of the human eye area in the image and the target angle (that is, the angle to which the line of sight needs to be adjusted) can be input into the GAN for processing to obtain an image after the adjustment of the human eye line of sight.

However, when GAN is used for line-of-sight adjustment, unreal information usually exists in the adjusted image (that is, the generated image), which may change the original shape of the human eye and cause image distortion. In addition, the size of the input image and output image of the generative confrontation network is usually a fixed value, which cannot support the line-of-sight adjustment of arbitrary high-resolution images.

In other techniques, the gaze of the human eye is adjusted using convolutional neural networks (CNN). For example, the image of the human eye area and the adjusted angle in the image can be input into the CNN for processing to obtain the adjusted image of the human eye line of sight.

In this approach, the input to the convolutional neural network includes adjusting the angle. Due to the poor accuracy of the currently detected line of sight angle, it cannot meet the needs of line of sight adjustment. It is usually assumed that the user's line of sight has a fixed angle of deviation, and the adjustment angle is also set to a fixed value, but the fixed adjustment angle cannot meet the needs of any direction. Sight adjustment. For example, it is usually assumed that the user looks at the center of the screen in the vertical screen state of electronic devices such as mobile phones and tablets, and adjusts the line of sight by adjusting the line of sight upwards at a fixed angle. When the user looks at the center of the screen in the landscape state, the The line of sight adjusts the fixed angle upwards, causing the line of sight to be adjusted incorrectly.

In addition, the size of the input image and output image of the convolutional neural network is usually a fixed value, which cannot support the line-of-sight adjustment of arbitrary high-resolution images.

In order to solve the above technical problems, the present application provides an image processing method. The image processing method of the embodiment of the present application can detect the image to be processed collected by the image acquisition component, and determine the face area of the target object in the image to be processed. and the human eye area, and perform line-of-sight detection on the face area and the human eye area to obtain the gaze point of the target object, and then determine the line-of-sight angle of the target object according to the gaze point of the target object, and perform visual inspection on the human-eye area according to the line-of-sight angle Adjust to obtain the target image, so that the gaze point of the target object can be detected according to the image content, and then the line of sight angle can be determined, and the eye area of the target object can be adjusted based on the line of sight angle, which can not only improve the detection accuracy of the line of sight angle, but also realize The line of sight adjustment in any direction keeps the sight line of the human eye in the target image (that is, the image after the line of sight adjustment) facing squarely, improving the shooting effect and user experience.

The image processing method of the embodiment of the present application can be applied to electronic equipment. The electronic device can be a touch screen or a non-touch screen. The touch screen electronic device can be controlled by clicking or sliding on the display screen with a finger, a stylus, etc., and the non-touch screen electronic device can be connected to Input devices such as mouse, keyboard, and touch panel are controlled through the input devices.

Fig. 1 shows a schematic structural diagram of an electronic device 100 according to an embodiment of the present application.

The electronic device 100 may include a cell phone, a foldable electronic device, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a cell phone, a personal Digital assistant (personal digital assistant, PDA), augmented reality (augmented reality, AR) equipment, virtual reality (virtual reality, VR) equipment, artificial intelligence (artificial intelligence, AI) equipment, wearable equipment, vehicle equipment, smart home equipment, or at least one of smart city equipment. The embodiment of the present application does not specifically limit the specific type of the electronic device 100 .

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) connector 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.

It can be understood that, the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.

The processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors. The processor can generate an operation control signal according to the instruction opcode and the timing signal, and complete the control of fetching and executing the instruction.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 may be a cache memory. The memory may store instructions or data used by the processor 110 or used frequently. If the processor 110 needs to use the instruction or data, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc. The processor 110 may be connected to modules such as a touch sensor, an audio module, a wireless communication module, a display, and a camera through at least one of the above interfaces.

It can be understood that the interface connection relationship between the modules shown in the embodiment of the present application is only a schematic illustration, and does not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners in the foregoing embodiments, or a combination of multiple interface connection manners.

The electronic device 100 may implement a display function through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos and the like. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (liquid crystal display, LCD), organic light-emitting diode (organic light-emitting diode, OLED), active matrix organic light-emitting diode or active-matrix organic light emitting diode (active-matrix organic light emitting diode) diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (quantum dot light emitting diodes, QLED), etc. In some embodiments, the electronic device 100 may include one or more display screens 194 .

The electronic device 100 can realize the camera function through the camera 193, ISP, video codec, GPU, display screen 194, application processor AP, neural network processor NPU, and the like.

The camera 193 can be used to collect color image data and depth data of the subject. The ISP can be used to process the color image data collected by the camera 193 . For example, when taking a picture, open the shutter, the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.

In some embodiments, the electronic device 100 may include one or more cameras 193 . Specifically, the electronic device 100 may include a front camera 193 and a rear camera 193 . Among them, the front camera 193 can usually be used to collect the color image data and depth data of the photographer facing the display screen 194, and the rear camera can be used to collect the color image data and depth data of the object (such as people, scenery, etc.) the photographer is facing. Image data as well as depth data.

In some embodiments, the CPU, GPU or NPU in the processor 110 can process the color image data and depth data collected by the camera 193 .

In some embodiments, the processor 110 can detect the image to be processed collected by the image acquisition component (such as the camera 193, etc.), determine the face area and the eye area of the target object in the image to be processed, and analyze the face area and human eye area to detect the line of sight, determine the gaze point of the target object, and the gaze point is used to indicate the position of the sight line of the target object on the preset reference plane; then according to the gaze point, determine the line of sight angle of the target object, and according to the line of sight The angle is adjusted to the human eye area to obtain the target image.

The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .

The layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces. In some embodiments, the Android system is divided into five layers, which are application program layer, application program framework layer, Android runtime (Android runtime, ART) and native C/C++ library, hardware abstraction layer (Hardware Abstract Layer, HAL) and the kernel layer.

The application layer can consist of a series of application packages.

As shown in Figure 2, the application package may include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.

The application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 2, the application framework layer may include window managers, content providers, view systems, resource managers, notification managers, activity managers, input managers, and so on.

The window manager provides window management service (Window Manager Service, WMS). WMS can be used for window management, window animation management, surface management and as a transfer station for input systems.

Content providers are used to store and retrieve data and make it accessible to applications. This data can include videos, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.

The view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. The view system can be used to build applications. A display interface can consist of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.

The notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify the download completion, message reminder and so on. The notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.

The activity manager can provide activity management service (Activity Manager Service, AMS), AMS can be used for system components (such as activities, services, content providers, broadcast receivers) to start, switch, schedule, and manage and schedule application processes .

The input manager can provide input management service (Input Manager Service, IMS), and IMS can be used to manage the input of the system, such as touch screen input, key input, sensor input, etc. IMS fetches events from input device nodes, and distributes events to appropriate windows through interaction with WMS.

The Android runtime includes the core library and the Android runtime. The Android runtime is responsible for converting source code into machine code. The Android runtime mainly includes the use of ahead of time (ahead or time, AOT) compilation technology and just in time (just in time, JIT) compilation technology.

The core library is mainly used to provide basic Java class library functions, such as basic data structure, mathematics, IO, tools, database, network and other libraries. The core library provides APIs for users to develop Android applications. .

A native C/C++ library can include multiple functional modules. For example: surface manager (surface manager), media framework (Media Framework), libc, OpenGL ES, SQLite, Webkit, etc.

Among them, the surface manager is used to manage the display subsystem, and provides the fusion of 2D and 3D layers for multiple applications. The media framework supports playback and recording of various commonly used audio and video formats, as well as still image files. The media library libc can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc. OpenGL ES provides the drawing and manipulation of 2D graphics and 3D graphics in applications. SQLite provides a lightweight relational database for applications of the electronic device 100 .

The hardware abstraction layer runs in user space, encapsulates the kernel layer driver, and provides a call interface to the upper layer.

The kernel layer is the layer between hardware and software. The kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.

The workflow of the software and hardware of the electronic device 100 will be exemplarily described below in combination with a selfie scene.

When the user takes a selfie through the front camera of the electronic device 100, after clicking the camera button, the corresponding hardware interrupt information is sent to the kernel layer, and the input manager of the application framework layer obtains the interrupt information from the kernel layer and recognizes it. The application corresponding to the interrupt information is a camera application, and the camera application calls the camera driver through the interface of the application framework layer and the kernel layer, and then captures images or videos through the front camera. After obtaining the image or video, the electronic device 100 may adjust the line of sight of human eyes in the image or video through the processor 110 to obtain an adjusted image or video.

The image processing method of the embodiment of the present application can automatically detect and adjust (or correct) the sight line of the human eye in the image or video, and can be used in scenes such as single-person Selfie, multi-person Selfie, video call, etc. that are shot through the front camera. For example, in the scenario where a single person takes a selfie through the front camera of a mobile phone, after the user clicks the photo button, the image to be processed is obtained. The image to be processed can be regarded as an intermediate image that is not displayed to the user after shooting; the processor in the mobile phone can Detect and adjust the human eye sight in the image to be processed to obtain the target image, and store the target image in the gallery or album. When the user opens the gallery or photo album to browse photos, the photos he sees are photos with adjusted eyesight.

The image processing method of the embodiment of the present application can also be used in scenes that are photographed by a rear camera (such as group photo shooting, etc.) and scenes that adjust the sight line of the human eye to the photographed results of other image acquisition devices. The image processing method in the embodiment of the present application can also be used in other scenarios where the line of sight of the human eye needs to be adjusted, which is not specifically limited in the present application.

Fig. 3 shows a flowchart of an image processing method according to an embodiment of the present application. As shown in Figure 3, the image processing method includes:

Step S310, detecting the image to be processed collected by the image acquisition component, and determining the face area and eye area of the target object in the image to be processed.

Wherein, the image acquisition component may be a component capable of image or video acquisition such as a camera, video camera, camera, etc., and the image acquisition component may be integrated in the electronic device, or may be an independent component. This application does not limit the specific type and setting method of the image acquisition component.

The image to be processed may be an image (such as a photo) captured by the image capture component, or any video frame in the video captured by the image capture component. The image to be processed can be the image directly collected by the image acquisition unit, or the image obtained by further processing the image acquired by the image acquisition unit. The further processing includes various image imaging processing or enhancement processing. The processing can be performed by a circuit , the circuit can be a hardware circuit or can run suitable software, such as the circuit is an image signal processor (ISP).

The image to be processed may include at least one target object, which may include a person. That is to say, the image to be processed may be a photo of a single person or a video frame including one person, and the image to be processed may also be a photo of multiple people or a video frame including multiple people.

The image to be processed can be detected, and the face area and eye area of the target object in the image to be processed can be determined. For example, target recognition can be performed on the processed image to determine the target object in the image to be processed, that is, to determine the area where the target object is located from the image to be processed, and then detect the area where the target object is located to determine the face of the target object area and the human eye area.

In a possible implementation, when determining the face area and eye area of the target object, face detection can be performed on the image to be processed first, to obtain the face area of the target object in the image to be processed, that is, the face area of the target object The position of the face frame; then perform face key point detection on the face area to obtain the face key points of the target object, and determine the human eye area of the target object in the image to be processed according to the human eye key points in the face key points. Optionally, face detection can be performed through a pre-trained face detection model (such as a convolutional neural network CNN), and face key point detection can also be performed through a pre-trained human face key point detection model. This application does not limit the specific methods of face detection and face key point detection.

Through face detection and face key point detection, the face area and eye area of the target object can be determined, which can improve the processing efficiency.

Step S320, performing line-of-sight detection on the face area and the eye area to determine the gaze point of the target object.

Wherein, the gaze point may be used to indicate the position of the line of sight of the target object on the preset reference plane. The reference plane may be the plane where the lens of the image acquisition component is located, or other preset planes. For example, in a scene where the user uses the front camera of the mobile phone to take a selfie, the plane where the screen of the mobile phone is located may be determined as the reference plane, and the plane where the front camera is located may also be determined as the reference plane. The present application does not limit the specific position of the reference plane.

In a possible implementation manner, a reference point may be preset on the image acquisition component, and the plane where the reference point is located is determined as the reference plane. Wherein, the reference point on the image acquisition component can be set according to the actual situation. For example, assuming that the image acquisition component is a camera, any point in the position of the camera or the center point of the position of the camera may be determined as a reference point on the camera. The reference point may also be other points on the image acquisition component, and the present application does not limit the specific position of the reference point on the image acquisition component.

When determining the gaze point of the target object, you can first judge whether the key points of the human eye in the key points of the face are complete. If the key points of the human eye are complete, then perform line-of-sight detection on the face area and the human eye area to determine the target object point of gaze. In the case that the key points of the human eyes of the target object are incomplete, the processing process ends without adjusting the line of sight of the target object.

By judging the key points of the human eye, it is possible to detect the line of sight of the face area and the human eye area of the target object when the eyes of the target object are not blocked, and determine the gaze point, thereby improving the accuracy of the gaze point of the target object.

In a possible implementation manner, when determining the gaze point of the target object, the head posture of the target object may also be determined according to key points of the target object's face. The head posture can be represented by three Euler angles: pitch, roll, and raw. Among them, the pitch angle corresponds to raising or lowering the head, the yaw angle corresponds to shaking the head, and the roll angle corresponds to turning the head.

After determining the head pose of the target object, it can be judged whether the head pose of the target object satisfies a preset condition, and the preset condition may include that the pitch angle in the head pose is less than or equal to a preset pitch angle threshold and the roll angle is less than or equal to Preset roll angle threshold. When the head posture satisfies the preset conditions, the line of sight detection is performed on the face area and the human eye area to determine the gaze point of the target object.

When the head pose of the target object does not meet the preset conditions, it can be considered that the image to be processed is a photo taken by the user in the scene of the high-angle head pose of the target object. In order to avoid disturbing the user's shooting intention, the processing process can be ended. The line of sight of the target object in the scene is not adjusted.

For example, when the pitch angle in the head pose of the target object is greater than the preset pitch angle threshold, that is, when the head pose of the target object is raised or lowered at a large angle, it may be considered that the image to be processed is taken by the user. In order to avoid interfering with the user's shooting intention, the target subject's line of sight is not adjusted.

For another example, when the roll angle in the head pose of the target object is greater than the preset roll angle threshold, that is, when the head pose of the target object is turned at a large angle (such as a side face), it can be considered The processed image is a photo of the side face of the target object taken by the user. In order to avoid disturbing the user's shooting intention, the line of sight of the target object is not adjusted.

Through the head pose detection, it is possible to filter the photos in the scene of the large-angle head pose of the target object when the human eye sight is adjusted, so as to reduce the interference with the user's shooting intention and improve the user experience.

In a possible implementation, when determining the gaze point of the target object, it is possible to first determine whether the key points of the human eyes of the target object are complete; when the key points of the human eyes of the target object are complete, then determine the head of the target object Whether the posture meets the preset conditions; when the head posture of the target object meets the preset conditions, the line of sight detection is performed on the face area and the human eye area to obtain the gaze point of the target object.

In a possible implementation, when determining the gaze point of the target object, it is also possible to first determine whether the head posture of the target object satisfies the preset condition; Whether the key points of the human eyes of the target object are complete; if the key points of the human eyes of the target object are complete, perform line-of-sight detection on the face area and the human eye area to obtain the gaze point of the target object.

It should be noted that those skilled in the art may set the judgment order of the above two conditions according to actual conditions, which is not limited in the present application.

In a possible implementation manner, line-of-sight detection may be implemented through neural network detection. For example, in the case where the image processing method in the embodiment of the present application is implemented by a neural network, the neural network may include a line of sight detection subnetwork, and the line of sight detection subnetwork may be used to detect the line of sight of the face area and the human eye area to obtain the target The object's gaze point.

For example, for any target object in the image to be processed, after selecting 1 face area and 2 human eye areas of the target object from the image to be processed, the target object can be detected according to the input size of the line of sight detection subnetwork. 1 face area and 2 human eye areas are preprocessed, such as down-sampling or up-sampling, and the pre-processed 1 face area and 2 human eye areas are input into the line of sight detection sub-network, and the line of sight detection sub-network Line-of-sight detection is performed on the three area images of the target object to obtain the gaze point of the target object.

Wherein, the line-of-sight detection sub-network is a pre-trained convolutional neural network (CNN) for line-of-sight detection, a residual network (residual network, ResNet), etc. This application does not limit the network type of the line-of-sight detection sub-network.

A neural network (such as a line of sight detection sub-network) is used to detect the gaze of the face area and the human eye area to obtain the gaze point of the target object, which can not only improve the processing efficiency, but also improve the accuracy of the gaze point of the target object.

Step S330: Determine the sight angle of the target object according to the gaze point.

Wherein, the line-of-sight angle of the target object is used to indicate the offset of the gaze point of the target object relative to the reference point on the image acquisition component. When determining the line-of-sight angle of the target object, the pupil pixel distance of the target object can be determined according to the number of pixels between the center points of the pupils of the target object in the face area, and according to the pupil pixel distance, the preset pupil physical The distance and shooting parameters of the image acquisition component determine the first distance between the human eyes of the target object and the gaze point during image acquisition. The first distance may also be referred to as the human eye distance.

Wherein, the pupil physical distance refers to the real distance between the pupils of the two eyes, which can be determined through statistics. For example, if the statistical value of the real distance between the pupils of the two eyes is 59mm, the physical distance between the pupils can be preset as 59mm. Those skilled in the art can determine the specific value of the pupil physical distance according to the actual statistical value, which is not limited in the present application.

The capture parameters of the image capture component may be used to indicate the configuration parameters when the image capture component captures (or captures) the image to be processed. For example, when the image acquisition component is a camera, its shooting parameters include at least one of the field of view (field of view, FOV), focal length, and sensor size of the camera.

In a possible implementation, according to the pupil pixel distance, the preset pupil physical distance and the field of view FOV in the shooting parameters of the image acquisition component, the first distance between the human eye and the fixation point of the target object during image acquisition can be calculated. a distance.

In a possible implementation, the distance between the human eye and the gaze point of the target object during image acquisition can also be calculated according to the pupil pixel distance, the preset pupil physical distance, the focal length and the sensor size in the shooting parameters of the image acquisition component. the first distance of .

Optionally, the first distance face_distance between the eyes of the target object and the gaze point can be determined by the following formula (1):

In the formula (1), f (mm) represents the focal length in the shooting parameters of the image acquisition component, such as the focal length of the camera; IPD (mm) represents the preset pupil physical distance, and image_width (pixel) represents the person of the target object in the image to be processed The pixel width of the face area, IPD (pixels) represents the pupil pixel distance of the target object, and sensor_width (mm) represents the width value of the sensor size in the shooting parameters of the image acquisition component.

Then, according to the gaze point and the first distance, the eye position of the target object during image acquisition can be determined, and the sight angle of the target object can be calculated through the triangular relationship among the eye position, gaze point, and reference point.

Fig. 4 shows a schematic diagram of viewing angles according to an embodiment of the present application. As shown in FIG. 4 , the line-of-sight angle A can be calculated according to the triangular relationship among the human eye position, gaze point, and reference point. Wherein, the distance between the human eye position and the gaze point is the first distance (ie, the human eye distance).

In a possible implementation, when calculating the line-of-sight angle of the target object according to the above triangular relationship, the second distance between the reference point and the fixation point can be determined, and the target object's distance can be determined according to the first distance and the second distance. line of sight angle.

For example, the sight angle A of the target object can be determined by the following formula (2):

A＝arctan(gaze_distance/face_distance) (2)

In formula (2), arctan represents the arc tangent, gaze_distance represents the second distance, that is, the distance between the reference point and the gaze point, and face_distance represents the first distance, that is, the distance between the human eye position and the gaze point.

Fig. 5 shows a schematic diagram of a process of determining a line-of-sight angle according to an embodiment of the present application. As shown in Figure 5, after determining the human face region and human eye region of the target object from the image to be processed, one human face region 501 and two human eye regions 502 of the target object can be selected from the image to be processed respectively, and Carry out preprocessing (not shown in the figure) to human face area 501 and human eye area 502, then input line of sight detection sub-network 506 to carry out line of sight detection, obtain the gaze point 509 of the target object;

It is also possible to determine the pupil pixel distance 505 of the target object according to the number of pixels between the central points of the pupils of the target object in the face area 501, and according to the preset physical pupil distance 503 and the shooting parameters 504 of the image acquisition component And pupil pixel distance 505, determine the human eye distance 508 of the target object, that is, the distance between the human eye of the target object and the gaze point during image acquisition;

Then, according to the reference point 507 on the image acquisition component, the human eye distance 508 and the fixation point 509, the sight angle 510 of the target object can be determined through the above formula (2).

The embodiment of the present application determines the angle of sight of the target object through the triangle relationship determined by the point of gaze, the distance of the human eye, and the reference point. Compared with the prior art (directly inputting the image of the face area into the network regression model to obtain the angle of sight), not only The detection difficulty of the line-of-sight angle can be greatly reduced, and the detection accuracy of the line-of-sight angle can also be improved.

Step S340, adjusting the human eye area according to the viewing angle to obtain a target image.

In a possible implementation, the sight angle of the target object and the human eye area of the target object in the image to be processed can be input into a model such as a convolutional neural network to adjust the human eye area to obtain the target image, so that the target image The human eye's line of sight can be kept emmetropic, that is, the correction of the human eye's line of sight can be realized.

In a possible implementation, the line-of-sight transformation relationship of each pixel in the human eye area can be determined according to the line-of-sight angle and the reference point, for example, the line-of-sight transformation function, etc., and according to the line-of-sight transformation relationship, the person in the image to be processed The eye area is adjusted to obtain the target image. Wherein, the line-of-sight transformation relationship may also be expressed in other ways, which is not limited in the present application.

It should be noted that those skilled in the art can set the adjustment mode of the human eye area in the image to be processed according to the actual situation, which is not limited in the present application.

The image processing method of this embodiment can detect the image to be processed collected by the image acquisition component, determine the face area and eye area of the target object in the image to be processed, and perform sight line detection on the face area and eye area , get the gaze point of the target object, and then determine the gaze angle of the target object according to the gaze point of the target object, and adjust the human eye area according to the gaze angle to obtain the target image, so that the gaze point of the target object can be detected according to the image content , and then determine the line of sight angle, and adjust the human eye area of the target object based on the line of sight angle, which can not only improve the detection accuracy of the line of sight angle, but also realize the line of sight adjustment in any direction, so that the human eye line of sight in the target image remains square, Improve shooting effect and user experience.

Fig. 6 shows a flowchart of an image processing method according to an embodiment of the present application. As shown in FIG. 6 , the image processing method of this embodiment may include step S310 , step S320 , step S330 , step S3401 , step S3402 and step S3403 . Wherein, step S3401 , step S3402 and step S3403 are a possible more detailed implementation of step S340 in the embodiment shown in FIG. 3 .

Wherein, the line-of-sight angle is used to indicate the offset of the gaze point relative to a reference point on the image acquisition component.

Optionally, steps S310 , S320 , and S330 in the embodiment shown in FIG. 6 are similar to steps S310 , S320 , and S330 in the embodiment shown in FIG. 3 , and will not be repeatedly described here.

Step S3401, determining a line-of-sight adjustment angle according to the line-of-sight angle and the reference point on the image acquisition component.

When determining the adjustment angle of the line of sight, the target position to which the line of sight is adjusted can be set as: reference point, reference point + preset angle, reference point - preset angle, etc. according to the actual situation.

When the target position is the reference point, the line-of-sight angle of the target object can be directly determined as the line-of-sight adjustment angle; when the target position is "reference point + preset angle", the "line-of-sight angle + preset angle" It is determined as the line of sight adjustment angle; when the target position is "reference point - preset angle", the "line of sight angle - preset angle" can be determined as the line of sight adjustment angle.

It should be noted that those skilled in the art can determine the target position to which the line of sight is adjusted according to the actual situation, and the present application does not limit this.

Step S3402: Determine a line-of-sight transformation relationship according to the line-of-sight adjustment angle and the eye area.

In a possible implementation manner, determining the line-of-sight transformation relationship may be implemented through neural network processing. For example, in the case where the image processing method in the embodiment of the present application is implemented by a neural network, the neural network may further include a line of sight transformation subnetwork for determining a line of sight transformation relationship. Optionally, the line-of-sight transformation relationship may include a first line-of-sight transformation matrix.

According to the input size of the line of sight transformation sub-network, the human eye area selected from the image to be processed can be down-sampled to obtain the down-sampled (or up-sampled) human eye area, so that the down-sampled (or up-sampled) human eye area The size of the human eye region matches the input size of the gaze transformation subnetwork.

Then the human eye area after adjusting the line of sight angle and down-sampling (or up-sampling) is input into the line-of-sight transformation sub-network for processing to obtain the second line-of-sight transformation matrix, and up-sampling (or down-sampling) the second line-of-sight transformation matrix to obtain The first line-of-sight transformation matrix, so that the size of the first line-of-sight transformation matrix matches the size of the human eye area.

Determine the second line-of-sight transformation matrix through the line-of-sight transformation sub-network, and perform up-sampling or down-sampling on the second line-of-sight transformation matrix to obtain the first line-of-sight transformation matrix matching the size of the human eye area, so that the first line-of-sight transformation matrix can be directly Acts on the human eye area at native resolution, thus supporting line-of-sight adjustment for images of any resolution.

Step S3403, according to the line-of-sight transformation relationship, adjust the human eye area to obtain a target image.

That is to say, according to the first line-of-sight transformation matrix, the human eye area in the image to be processed can be processed to obtain the target image, so that the human eye line of sight in the target image can be kept square, that is, the line of sight of the human eye can be corrected. Wherein, the resolution of the target image is the same as that of the image to be processed.

Fig. 7 shows a schematic diagram of a processing procedure of line of sight adjustment according to an embodiment of the present application. As shown in Figure 7, the human eye area 701 in the image to be processed can be down-sampled 702 to obtain the down-sampled human eye area, and the line of sight can be determined according to the line-of-sight angle 710 of the target object and the reference point 709 on the image acquisition component Adjusting the angle 711; then, the downsampled human eye area and line of sight adjustment angle 711 are input into the line of sight transformation sub-network 703 for processing to obtain the second line of sight transformation matrix 704, and the second line of sight transformation matrix 704 is up-sampled 705, Obtain the first line of sight transformation matrix 706, wherein the size of the first line of sight transformation matrix 706 matches the size of the human eye area 701 in the image to be processed; according to the first line of sight transformation matrix 706, the human eye area 701 in the image to be processed Sight adjustment 707 is performed to obtain a target image 708 . It should be understood that the down-sampling and up-sampling in this embodiment are not necessary processes, and the purpose of down-sampling is only to reduce the burden of data processing.

The image processing method of this embodiment can detect the image to be processed collected by the image acquisition component, determine the face area and eye area of the target object in the image to be processed, and perform sight line detection on the face area and eye area , to obtain the gaze point of the target object, and then determine the line of sight angle of the target object according to the gaze point of the target object, and determine the line of sight adjustment angle according to the line of sight angle and the reference point on the image acquisition component; then adjust the angle and the human eye area according to the line of sight , to determine the line-of-sight transformation relationship; according to the line-of-sight transformation relationship, the human eye area is adjusted to obtain the target image, so that the line-of-sight transformation relationship can be determined, and the line-of-sight transformation relationship can be directly applied to the human eye area in the image to be processed to achieve any resolution The line-of-sight adjustment of the image.

In the case where the image processing method of the embodiment of the present application is implemented by a neural network, the neural network may include a line of sight detection subnetwork and a line of sight transformation subnetwork, and the method may also include: according to the preset first training set, The line-of-sight detection sub-network is trained, and the first training set includes reference line-of-sight angles of a plurality of sample objects, human face area reference images and human eye area reference images of a plurality of sample objects; according to the preset second training set , training the sight line transformation sub-network, the second training set includes a plurality of human eye area reference images, reference line of sight adjustment angles and reference line of sight transformation relationships corresponding to each human eye area reference image.

When training the line of sight detection sub-network, the face area reference image and eye area reference image of any sample object in the first training set can be input into the line of sight detection sub-network for line of sight detection, and the line of sight angle of the sample object can be obtained , and determine the difference between the line-of-sight angle of the sample object and its reference line-of-sight angle; then according to the differences between the line-of-sight angles of multiple sample objects in the first training set and their reference line-of-sight angles, determine the network loss of the line-of-sight detection sub-network, and According to the network loss of the line-of-sight detection sub-network, its network parameters are adjusted.

When the line of sight detection sub-network meets the preset first training end condition, the training can be ended to obtain a trained line of sight detection sub-network. The trained line-of-sight detection sub-network can be applied to the above embodiments to perform line-of-sight detection on the face area and eye area of the target object to obtain the gaze point of the target object.

Among them, the first training end condition can be, for example, that the training rounds of the line of sight detection sub-network reach the preset threshold, the network loss convergence domain of the line of sight detection sub-network is within a certain range, and the line of sight detection sub-network is verified on the first preset verification set. pass. Those skilled in the art can set the specific content of the first training end condition according to the actual situation, and the application does not limit this.

When training the line-of-sight transformation sub-network, any reference image of the face area in the second training set and the reference line-of-sight corresponding to the reference image of the human-eye area can be adjusted in angle, and input into the line-of-sight transformation sub-network for processing to obtain The line of sight transformation relationship corresponding to the eye area reference image, and determine the difference between the line of sight transformation relationship corresponding to the human eye area reference image and its reference line of sight transformation relationship; then according to the line of sight transformation relationship of multiple human face area reference images in the second training set Instead of referring to the difference between the line-of-sight transformation relations, the network loss of the line-of-sight transformation sub-network is determined, and the network parameters of the line-of-sight transformation sub-network are adjusted according to the network loss of the line-of-sight transformation sub-network.

When the line of sight conversion sub-network satisfies the preset second training end condition, the training can be ended to obtain a trained line of sight conversion sub-network. The trained line-of-sight transformation sub-network can be applied to the above embodiments to determine the line-of-sight transformation relationship.

Wherein, the second training end condition can be, for example, that the training rounds of the line-of-sight transformation sub-network reach a preset threshold, the network loss convergence domain of the line-of-sight transformation sub-network is within a certain range, and the line-of-sight transformation sub-network is verified on the preset second verification set. pass. Those skilled in the art can set the specific content of the second training end condition according to the actual situation, which is not limited in the present application.

Through the first training set and the second training set, the line-of-sight detection sub-network and the line-of-sight transformation sub-network in the neural network are respectively trained, which can improve the accuracy of the line-of-sight detection sub-network and the line-of-sight transformation sub-network.

The image processing method of the embodiment of the present application can automatically detect and correct the sight line of the human eye in the image, so that the line of sight of the human eye in the corrected target image remains square, which improves the photographing effect and photographing experience. For example, for a single-person scene, when the user takes a selfie, he can look at the screen to understand the overall effect of the image, and at the same time maintain the effect of people's eyes looking squarely in the shot; for a multi-person selfie scene, it is difficult to ensure that everyone Looking at the camera, by automatically detecting and correcting the human eye line of sight in the photo, it saves the user's follow-up processing and improves the efficiency of taking photos.

The image processing method of the embodiment of the present application can support the correction of the sight line of human eyes in any direction. For example, it supports not only taking pictures on the horizontal screen of the mobile phone, but also supports taking pictures on the vertical screen of the mobile phone. The photos taken by the mobile phone camera in any direction can automatically detect and correct the human eye sight in the photos. The user does not need to do any operation, and the use is simple and convenient.

The image processing method of the embodiment of the present application can support the detection and correction of human eye sight in any high-resolution image without reducing the resolution and definition of the human eye area in the image. Moreover, by down-sampling the input image of the line-of-sight transformation sub-network and up-sampling the output result, on the premise of supporting line-of-sight correction of high-resolution images, the input size of the line-of-sight transformation sub-network is a fixed size, which can improve processing efficiency, making The calculation amount of the line-of-sight correction process for images with different resolutions is basically the same, which is very friendly to mobile low-power electronic devices such as mobile phones.

Fig. 8 shows a block diagram of an image processing device according to an embodiment of the present application. As shown in Figure 8, the image processing device includes:

An image acquisition component 810, configured to acquire an image of the target object to obtain an image to be processed;

The processing component 820 is configured to: detect the image to be processed, determine the face area and eye area of the target object in the image to be processed; Detecting and determining the gaze point of the target object, the gaze point is used to indicate the position of the sight line of the target object on a preset reference plane; according to the gaze point, determine the gaze angle of the target object, the The line-of-sight angle is used to indicate the offset of the gaze point relative to the reference point on the image acquisition component; according to the line-of-sight angle, the human eye area is adjusted to obtain a target image.

In a possible implementation manner, the determining the line-of-sight angle of the target object according to the gaze point includes: determining a first distance between the human eyes of the target object and the gaze point; The gaze point, the reference point and the first distance are used to determine the line-of-sight angle of the target object.

In a possible implementation manner, the determining the line-of-sight angle of the target object according to the gaze point, the reference point, and the first distance includes: determining the distance between the reference point and the gaze point A second distance between them; according to the first distance and the second distance, determine the line-of-sight angle of the target object.

In a possible implementation manner, the adjusting the human eye area according to the sight angle to obtain the target image includes: determining a sight line adjustment angle according to the sight line angle and the reference point; The line of sight adjustment angle and the human eye area are used to determine a line of sight transformation relationship; according to the line of sight transformation relationship, the human eye area is adjusted to obtain the target image.

In a possible implementation manner, the line of sight detection is implemented through neural network detection.

In a possible implementation manner, the determination of the line-of-sight transformation relationship is implemented through neural network processing.

In a possible implementation manner, the detecting the image to be processed, and determining the face area and eye area of the target object in the image to be processed include: the image to be processed collected by the image acquisition component Carry out face detection to obtain the face area of the target object in the image to be processed; perform face key point detection on the face area to obtain the face key point of the target object; according to the face key The human eye key points in the points are used to determine the human eye area of the target object in the image to be processed.

In a possible implementation manner, the performing line-of-sight detection on the human face area and the human eye area, and determining the gaze point of the target object includes: determining the target object according to the key points of the human face The head pose of the subject; judging whether the head pose satisfies a preset condition, the preset condition includes that the pitch angle in the head pose is less than or equal to a preset pitch angle threshold and the roll angle is less than or equal to a preset roll angle Turning angle threshold; when the head posture satisfies the preset condition, line-of-sight detection is performed on the face area and the eye area to determine the gaze point of the target object.

In a possible implementation manner, the performing line-of-sight detection on the human face area and the human eye area, and determining the gaze point of the target object includes: judging the human eye key points in the human face key points Whether the points are complete; if the key points of the human eyes are complete, perform line-of-sight detection on the human face area and the human eye area to determine the gaze point of the target object.

In a possible implementation manner, the reference plane includes a plane where the reference point is located.

An embodiment of the present application provides an image processing device, including: an image acquisition component, a processor, and a memory for storing processor-executable instructions; wherein, the processor is configured to implement the above method when executing the instructions .

An embodiment of the present application provides a computer program product, including computer-readable codes, or a non-volatile computer-readable storage medium bearing computer-readable codes, when the computer-readable codes are stored in a processor of an electronic device When running in the electronic device, the processor in the electronic device executes the above method.

An embodiment of the present application provides a non-volatile computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is realized.

A computer readable storage medium may be a tangible device that can retain and store instructions for use by an instruction execution device. A computer readable storage medium may be, for example, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disk, hard disk, random access memory (Random Access Memory, RAM), read only memory (Read Only Memory, ROM), erasable Electrically Programmable Read-Only-Memory (EPROM or flash memory), Static Random-Access Memory (Static Random-Access Memory, SRAM), Portable Compression Disk Read-Only Memory (Compact Disc Read-Only Memory, CD -ROM), Digital Video Disc (DVD), memory sticks, floppy disks, mechanically encoded devices such as punched cards or raised structures in grooves with instructions stored thereon, and any suitable combination of the foregoing .

Computer readable program instructions or codes described herein may be downloaded from a computer readable storage medium to a respective computing/processing device, or downloaded to an external computer or external storage device over a network, such as the Internet, local area network, wide area network, and/or wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or a network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .

Computer program instructions for performing the operations of the present application may be assembly instructions, instruction set architecture (Instruction Set Architecture, ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, state setting data, or in one or more source or object code written in any combination of programming languages, including object-oriented programming languages—such as Smalltalk, C++, etc., and conventional procedural programming languages—such as “like” languages or similar programming languages. Computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement. In cases involving a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it may be connected to an external computer (for example, using Internet Service Provider to connect via the Internet). In some embodiments, electronic circuits, such as programmable logic circuits, field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or programmable logic arrays (Programmable Logic Array, PLA), the electronic circuit can execute computer-readable program instructions, thereby realizing various aspects of the present application.

Aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that when executed by the processor of the computer or other programmable data processing apparatus , producing an apparatus for realizing the functions/actions specified in one or more blocks in the flowchart and/or block diagram. These computer-readable program instructions can also be stored in a computer-readable storage medium, and these instructions cause computers, programmable data processing devices and/or other devices to work in a specific way, so that the computer-readable medium storing instructions includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks in flowcharts and/or block diagrams.

It is also possible to load computer-readable program instructions into a computer, other programmable data processing device, or other equipment, so that a series of operational steps are performed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , so that instructions executed on computers, other programmable data processing devices, or other devices implement the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.

The flowchart and block diagrams in the figures show the architecture, functions and operations of possible implementations of apparatuses, systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, a portion of a program segment, or an instruction that includes one or more Executable instructions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.

It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented with hardware (such as circuits or ASIC (Application Specific Integrated Circuit, application-specific integrated circuit)), or can be implemented with a combination of hardware and software, such as firmware.

Although the present invention has been described in conjunction with various embodiments herein, in the process of implementing the claimed invention, those skilled in the art can understand and Other variations of the disclosed embodiments are implemented. In the claims, the word "comprising" does not exclude other components or steps, and "a" or "an" does not exclude a plurality. A single processor or other unit may fulfill the functions of several items recited in the claims. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that these measures cannot be combined to advantage.

Having described various embodiments of the present application above, the foregoing description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Many modifications and alterations will be apparent to those of ordinary skill in the art without departing from the scope of the described embodiments. The terminology used herein is chosen to best explain the principle of each embodiment, practical application or improvement of technology in the market, or to enable other ordinary skilled in the art to understand each embodiment disclosed herein.

Claims

An image processing method, characterized in that the method comprises:

Detecting the image to be processed collected by the image acquisition component, and determining the face area and eye area of the target object in the image to be processed;

Perform line-of-sight detection on the face area and the human-eye area, and determine a gaze point of the target object, where the gaze point is used to indicate the position of the line of sight of the target object on a preset reference plane;

Determining a line-of-sight angle of the target object according to the gazing point, where the line-of-sight angle is used to indicate an offset of the gazing point relative to a reference point on the image acquisition component;

According to the line-of-sight angle, the human eye area is adjusted to obtain a target image.
The method according to claim 1, wherein said determining the line-of-sight angle of said target object according to said gaze point comprises:

determining a first distance between the human eyes of the target object and the gaze point;

A line-of-sight angle of the target object is determined according to the gaze point, the reference point, and the first distance.
The method according to claim 2, wherein the determining the line-of-sight angle of the target object according to the gaze point, the reference point and the first distance comprises:

determining a second distance between the reference point and the gaze point;

A line-of-sight angle of the target object is determined according to the first distance and the second distance.
The method according to any one of claims 1 to 3, characterized in that, adjusting the human eye area according to the line-of-sight angle to obtain a target image includes:

determining a line of sight adjustment angle according to the line of sight angle and the reference point;

determining a line of sight conversion relationship according to the line of sight adjustment angle and the human eye area;

According to the line-of-sight transformation relationship, the human eye area is adjusted to obtain the target image.
The method according to any one of claims 1 to 4, characterized in that the line of sight detection is realized through neural network detection.
The method according to claim 4, characterized in that the determination of the line-of-sight transformation relationship is realized through neural network processing.
The method according to any one of claims 1 to 6, wherein the reference plane comprises a plane where the reference point is located.
An image processing device, characterized in that the device comprises:

The image acquisition component is used for image acquisition of the target object to obtain the image to be processed;

processing component, configured to:

Detecting the image to be processed, and determining the face area and eye area of the target object in the image to be processed;

Perform line-of-sight detection on the face area and the human-eye area, and determine a gaze point of the target object, where the gaze point is used to indicate the position of the line of sight of the target object on a preset reference plane;

Determining a line-of-sight angle of the target object according to the gazing point, where the line-of-sight angle is used to indicate an offset of the gazing point relative to a reference point on the image acquisition component;

According to the line-of-sight angle, the human eye area is adjusted to obtain a target image.
The device according to claim 8, wherein the determining the line-of-sight angle of the target object according to the gaze point comprises:

determining a first distance between human eyes of the target object and the gaze point;

A line-of-sight angle of the target object is determined according to the gaze point, the reference point, and the first distance.
The device according to claim 9, wherein the determining the line-of-sight angle of the target object according to the gaze point, the reference point and the first distance comprises:

determining a second distance between the reference point and the gaze point;

A line-of-sight angle of the target object is determined according to the first distance and the second distance.
The device according to any one of claims 8 to 10, wherein the adjusting the human eye area according to the line-of-sight angle to obtain the target image includes:

determining a line of sight adjustment angle according to the line of sight angle and the reference point;

determining a line of sight conversion relationship according to the line of sight adjustment angle and the human eye area;

According to the line-of-sight transformation relationship, the human eye area is adjusted to obtain the target image.
The device according to any one of claims 8 to 11, wherein the line of sight detection is realized through neural network detection.
The device according to claim 11, characterized in that, said determination of line-of-sight transformation relationship is realized by neural network processing.
The device according to any one of claims 8 to 13, wherein the reference plane includes a plane where the reference point is located.
An image processing device, characterized in that it comprises:

The image acquisition component is used for image acquisition of the target object to obtain the image to be processed;

processor;

memory for storing processor-executable instructions;

Wherein, the processor is configured to implement the method according to any one of claims 1 to 7 when executing the instructions.
A non-volatile computer-readable storage medium on which computer program instructions are stored, wherein the computer program instructions implement the method according to any one of claims 1 to 7 when executed by a processor.
A computer program product, comprising computer-readable codes, or a non-volatile computer-readable storage medium carrying computer-readable codes, when the computer-readable codes run in an electronic device, the The processor executes the method of any one of claims 1-7.