CN113822114A

CN113822114A - Image processing method, related equipment and computer readable storage medium

Info

Publication number: CN113822114A
Application number: CN202110600889.7A
Authority: CN
Inventors: 易阳; 余晓铭; 周易; 涂娟辉; 李峰; 左小祥
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-12-21

Abstract

The embodiment of the application provides an image processing method, related equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring an eye image to be processed, wherein the eye image comprises an image of a first eye region and an image of a second eye region; processing the eye images through a sight line correction network to obtain target correction images, wherein the sight line correction network is obtained based on simulated eye sample images and real eye sample images of a target domain through training, and the simulated eye sample images are obtained through migration learning of virtual eye sample images of a source domain; the first eye region and the second eye region are filled with the target correction image to obtain the corrected eye image, the corrected eye image can be applied to an image processing scene based on a big data technology and an artificial intelligence computer vision technology, and the eye sight in the image can be corrected conveniently and accurately through binocular constraint and data field migration, so that a real and natural correction effect is achieved.

Description

Image processing method, related equipment and computer readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image processing method, a related device, and a computer-readable storage medium.

Background

In a common video session scene, such as an online video conference, the eyes of a user usually watch a screen below a camera instead of watching the camera directly, and this parallax may reduce the real sense of participation of the user in the video session scene, so that it is necessary to correct the line of sight of the user, and specifically, the eye region of a portrait in an image or a video may be finely adjusted by an algorithm, so as to achieve an effect that the direction of the line of sight of the eyes may be directionally adjusted.

At present, most of sight line correction schemes are used for adjusting eye regions based on geometric deformation, the scheme needs to accurately calculate key points of a face or the eye regions and positions of the head and the eyes of a person relative to a camera and a screen, the intensity and the direction needing to be adjusted can be calculated by using an accurate relative position relation and the key points, the realization difficulty is high, the accuracy is low, and particularly when the deviation direction of eyeballs is large or the correction strength value is large, the naturalness and the trueness of correction results are often poor. Therefore, how to correct the line of sight conveniently and accurately to achieve a good correction effect has become an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides an image processing method, related equipment and a computer-readable storage medium, which can conveniently and accurately correct the sight of eyes through binocular constraint and data field migration, so as to realize a real and natural correction effect.

In a first aspect, an embodiment of the present application provides an image processing method, where the method includes:

an eye image to be processed is acquired, the eye image including an image of a first eye region and an image of a second eye region.

And processing the eye images through a sight line correction network to obtain target correction images, wherein the sight line correction network is obtained by training based on simulated eye sample images and real eye sample images of a target domain, and the simulated eye sample images are obtained by performing transfer learning on virtual eye sample images of a source domain.

And filling the first eye region and the second eye region by using the target correction image to obtain a corrected eye image.

In a second aspect, an embodiment of the present application provides an image processing apparatus, including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an eye image to be processed, and the eye image comprises an image of a first eye region and an image of a second eye region.

The processing module is used for processing the eye images through a sight line correction network to obtain target correction images, the sight line correction network is obtained based on simulated eye sample images and real eye sample images of a target domain through training, and the simulated eye sample images are obtained through migration learning of virtual eye sample images of a source domain.

The processing module is further configured to perform filling processing on the first eye region and the second eye region by using the target correction image to obtain a corrected eye image.

In a third aspect, an embodiment of the present application provides a computer device, where the computer device includes a processor, a network interface, and a storage device, where the processor, the network interface, and the storage device are connected to each other, where the network interface is controlled by the processor to transmit and receive data, and the storage device is used to store a computer program, where the computer program includes program instructions, and the processor is configured to call the program instructions to execute the image processing method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, the computer program including program instructions, which are executed by a processor to perform the image processing method according to the first aspect.

In a fifth aspect, the present application discloses a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the image processing method according to the first aspect.

In the embodiment of the application, a computer device can acquire an eye image to be processed, the eye image comprises an image of a first eye region and an image of a second eye region, the eye image is processed through a sight correction network to obtain a target correction image, the sight correction network is obtained by training a simulation eye sample image and a real eye sample image of a target domain, the simulation eye sample image is obtained by performing migration learning on a virtual eye sample image of a source domain, then the first eye region and the second eye region are filled with the target correction image to obtain a corrected eye image, the eye sight can be corrected conveniently and accurately through the field migration of binocular constraint and data, and the correction effect is real and natural.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic architecture diagram of an image processing system according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an image processing method according to an embodiment of the present application;

FIG. 3 is a schematic flowchart of another image processing method provided in the embodiments of the present application;

fig. 4a is a schematic diagram of an eye image provided in an embodiment of the present application;

fig. 4b is a schematic diagram of another eye image provided by the embodiment of the present application;

fig. 4c is a schematic overall flowchart of an eye gaze correction according to an embodiment of the present disclosure;

FIG. 4d is a schematic overall flow chart of another eye gaze correction provided by embodiments of the present application;

fig. 4e is a schematic overall flowchart of network training provided in the embodiment of the present application;

fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Cloud technology refers to a hosting technology for unifying serial resources such as hardware, software, network and the like in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.

Cloud computing (cloud computing) refers to a delivery and use mode of an IT infrastructure, and refers to obtaining required resources in an on-demand and easily-extensible manner through a network; the generalized cloud computing refers to a delivery and use mode of a service, and refers to obtaining a required service in an on-demand and easily-extensible manner through a network. Such services may be IT and software, internet related, or other services. Cloud Computing is a product of development and fusion of traditional computers and Network Technologies, such as Grid Computing (Grid Computing), Distributed Computing (Distributed Computing), Parallel Computing (Parallel Computing), Utility Computing (Utility Computing), Network Storage (Network Storage Technologies), Virtualization (Virtualization), Load balancing (Load Balance), and the like.

With the development of diversification of internet, real-time data stream and connecting equipment and the promotion of demands of search service, social network, mobile commerce, open collaboration and the like, cloud computing is rapidly developed. Different from the prior parallel distributed computing, the generation of cloud computing can promote the revolutionary change of the whole internet mode and the enterprise management mode in concept.

Big data (Big data) refers to a data set which cannot be captured, managed and processed by a conventional software tool within a certain time range, and is a massive, high-growth-rate and diversified information asset which can have stronger decision-making power, insight discovery power and flow optimization capability only by a new processing mode. With the advent of the cloud era, big data has attracted more and more attention, and the big data needs special technology to effectively process a large amount of data within a tolerance elapsed time. The method is suitable for the technology of big data, and comprises a large-scale parallel processing database, data mining, a distributed file system, a distributed database, a cloud computing platform, the Internet and an extensible storage system.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Computer Vision technology (CV) Computer Vision is a science for researching how to make a machine "see", and further refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

The embodiment of the application mainly relates to the cloud computing and big data technology in the cloud technical field and the computer vision and machine learning technology in the artificial intelligence field, can be used for correcting and processing the human eye sight in the image, and the detailed description is given below to the specific implementation mode of the application.

Fig. 1 is a schematic structural diagram of an image processing system according to an embodiment of the present disclosure, where the image processing system may specifically include a computer device 10 and one or more terminal devices 20. Wherein:

computer device 10 may provide services for an online video session, such as an online video conferencing service, and may provide corrective services for the eye during the video session. The terminal device 20 may be equipped with a video conference client, and provide an online video conference function to each corresponding user. The computer device 10 may perform data interaction with one or more terminal devices 20, for example, the terminal devices 20 may submit multimedia data to the computer device 10, where the multimedia data may be real-time video stream data, for example, image data including face images of various participants collected in real time in an online video conference scene, or a picture that needs to be corrected for human eyes.

Specifically, after receiving the image data sent by the terminal device 20, the computer device 10 may acquire a face image in the image data, cutting the face image to obtain images (including left eye image and right eye image) of the two-eye region, correcting the images of the two-eye region by using the sight line correction network to obtain the corrected left eye image and the corrected right eye image, then the corrected left eye image and the corrected right eye image are respectively pasted back to the cut face image, finally the face image with the accurately corrected sight line is obtained, and displays the corrected image of the eye sight through the client of each terminal device 20, and through the binocular constraint and the domain migration of data, the eye sight can be corrected conveniently and accurately, the correction effect is real and natural, and the reality and the immersive experience of the participation of the user can be improved when the eye sight correction method is applied to a video session scene.

The computer device 10 may be a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, web services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. The terminal device 20 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted smart terminal, and the like. The terminal device 20 and the computer device 10 may be connected directly or indirectly through wired or wireless communication, and the application is not limited herein.

Fig. 2 is a schematic flow chart of an image processing method according to an embodiment of the present application. The method specifically comprises the following steps:

201. an eye image to be processed is acquired, the eye image including an image of a first eye region and an image of a second eye region.

The eye image to be processed may be an eye image which needs to be subjected to line of sight correction, may be an eye image included in a picture which is shot by a shooting device in advance, or may be an eye image included in a picture in a real-time video data stream, and the real-time video data stream may be a video picture which is shot by the shooting device in real time in a video conference scene.

Specifically, the computer device may obtain a face image of a target user, perform key point detection on the face image, determine a region where both eyes of the target user are located in the face image, where a left eye region may be recorded as a first eye region, a right eye region may be recorded as a second eye region, take an image including the first eye region and an image including the second eye region in the face image as eye images to be processed, and may cut the image including the first eye region and the image including the second eye region from the face image in an image cutting manner. As shown in fig. 4a, the eye image of the target user is not focused on the shooting device, but focused on the lower side (e.g. the screen) of the shooting device, which may reduce the real sense of participation of the target user in the video session scene.

In some possible embodiments, the specific implementation manner of acquiring the face image of the target user may be:

the method comprises the steps of obtaining an environment image including a target user, and receiving the environment image sent by a client, wherein the environment image can be obtained by the client through calling a shooting device. After the environment image is acquired, face detection is carried out on the environment image to obtain a face image in the environment image. The client can comprise a video conference client, and a target user can participate in a video conference online.

202. And processing the eye images through a sight line correction network to obtain target correction images, wherein the sight line correction network is obtained by training based on simulated eye sample images and real eye sample images of a target domain, and the simulated eye sample images are obtained by performing transfer learning on virtual eye sample images of a source domain.

The simulated eye sample images can be obtained by performing migration learning on the virtual eye sample images of the source domain, then the sight line correction network is obtained by training based on the simulated eye sample images and the real eye sample images of the target domain, a large number of virtual eye sample images of the source domain can be subjected to domain migration and converted into the simulated eye sample images, the simulated eye sample images are used for training the sight line correction network, and the sight line correction network obtained by training based on a large number of data has the advantages of high accuracy and real and natural correction effect.

Specifically, the computer device processes the image of the first eye region and the image of the second eye region in the eye image through the trained sight line correction network to obtain the target correction image.

In some possible embodiments, the processing, by the computer device, the eye image through the line-of-sight correction network to obtain the target corrected image may specifically include: and processing the eye image by calling the sight line correction network to obtain a target correction image.

203. And filling the first eye region and the second eye region by using the target correction image to obtain a corrected eye image.

The target corrected image output by the line-of-sight correction network may specifically include a corrected image for a first eye region (referred to as a first eye corrected image), and a corrected image for a second eye region (referred to as a second eye corrected image).

Specifically, the computer device may perform a filling process on the first eye region by using the first eye correction image, and perform a filling process on the second eye region by using the second eye correction image, to obtain a corrected eye image. As shown in fig. 4b, for the corrected eye image, it can be seen that the corrected gaze direction will produce the effect of looking directly at the camera, so that the gaze direction of the human eye can be accurately corrected, and the reality and immersive experience of the user participating in the video session scene are improved.

In the embodiment of the application, an eye image to be processed can be acquired, the eye image comprises an image of a first eye region and an image of a second eye region, the eye image is processed through a sight line correction network to obtain a target correction image, the sight line correction network is obtained by training a simulation eye sample image and a real eye sample image of the target region, the simulation eye sample image is obtained by performing migration learning on a virtual eye sample image of a source region, then the first eye region and the second eye region are filled with the target correction image to obtain a corrected eye image, the eye sight line can be corrected conveniently and accurately through binocular constraint and data field migration, and the correction effect is real and natural.

Fig. 3 is a schematic flow chart of another image processing method according to an embodiment of the present application. The method specifically comprises the following steps:

301. a virtual eye sample image of a source domain and a real eye sample image of a target domain are obtained.

In the migration learning, a Domain (Domain) is composed of data and a probability distribution for generating the data, and generally D represents the Domain and P represents a probability distribution. In particular, there are two basic areas: source Domain (Source Domain) and Target Domain (Target Domain). The source domain is the object to be migrated, and generally has sufficient sample data; the target domain is the object that is ultimately desired. And the transfer learning completes the transfer of knowledge from the source domain to the target domain, and in specific application, source domain sample data and target domain sample data can be acquired from corresponding fields.

In the embodiment of the present application, the source domain may refer to a virtual domain (e.g., a machine domain), and the target domain may refer to a real domain (e.g., a real world domain).

Specifically, a large number of virtual eye sample images may be acquired by means of machine synthesis. Because of the difficulty and high cost in acquiring real data, the real eye sample image may be a small number, for example, an eye image of a real world person obtained by shooting or the like.

In some possible embodiments, in order to apply the a priori correlation information between the two eyes to the sight line correction process, the virtual eye sample image and the real eye sample image are both paired eye images, that is, the virtual eye sample image and the real eye sample image include both the left eye image and the corresponding right eye image. The priori associated information between the two eyes can also become constraint information, specifically, the prior associated information can comprise the size, the eye spirit direction, the eye skin color and the like, generally speaking, for the same user, the size, the eye spirit direction and the eye skin color of the left eye and the right eye are very similar, through information constraint between the two eyes, information communication between the two eyes is increased, the consistency of the left eye and the right eye in the correction effect is ensured, and the correction accuracy can be improved by adopting paired binocular image pairs to correct the sight line.

302. And processing the virtual eye sample image through a data migration network to obtain a simulated eye sample image.

The data migration network is a data generation network, and has a field migration capability, and can convert data of one field into data of other fields, for example, a virtual eye sample image synthesized by a machine can be converted into an image (recorded as a simulated eye sample image) similar to a real field, and the virtual eye sample image is converted by using the data migration network, so that a large number of simulated eye sample images available for a target field can be obtained, and the virtual eye sample image can be used as a training sample of a sight correction network, thereby ensuring that the sight correction network has a sufficient training data source, reducing the acquisition cost of the training data source, and training the network more conveniently.

In some possible embodiments, the computer device may perform supervised network training by using the virtual eye sample image and the real eye sample image as training data. Specifically, a generation network (denoted as a second generation network) may be initialized, the virtual eye sample image is input into the second generation network to obtain a first predicted eye image, a network loss of the second generation network is determined according to the first predicted eye image and the real eye sample image, a network parameter of the second generation network is adjusted by using the network loss, and when the network loss satisfies a convergence condition, a data migration network is obtained. The second generation network may be a cyclic generation countermeasure network (CycleGAN).

In some possible embodiments, the processing, by the computer device, the virtual eye sample image through the data migration network to obtain the simulated eye sample image may specifically include: and processing the virtual eye sample image by calling a data migration network to obtain a simulated eye sample image.

303. And training a first generation network by using the real eye sample image and the simulated eye sample image to obtain a sight line correction network.

The sight line correction network has sight line correction capability, and can finely adjust eye regions in the image so as to achieve the purpose of adjusting the sight line direction of human eyes.

Specifically, after the simulated eye sample image is obtained through conversion, the computer device can perform network training by using the real eye sample image and the simulated eye sample image to obtain the gaze correction network. For example, a generating network (referred to as a first generating network) is initialized, and training data and supervision data of the first generating network are determined according to the real eye sample image and the simulated eye sample image. The training data can comprise one or two of a real eye sample image and a simulated eye sample image, and the specific selection of the training data can be realized by taking the simulated eye sample image as the training data alone or taking the real eye sample image and the simulated eye sample image together as the training data; and selecting the supervision data by taking the real eye sample image of the target domain as the supervision data. After the training data and the supervision data are determined, the training data and the supervision data can be used for training the first generation network to obtain the sight line correction network, so that the real eye sample images are used as supervision information, a large number of simulated eye sample images are used for training the first generation network, and the sight line correction network with the sight line correction capability of the real world eye images can be obtained. The first generation network may adopt a Convolutional Neural Network (CNN) or the like.

In some possible embodiments, the specific training mode of the first generation network may include:

inputting training data into a first generation network to obtain a second prediction eye image, determining network loss of the first generation network according to the second prediction eye image and supervision data, namely the difference between the second prediction eye image and a real eye sample image, wherein the real eye sample image can be a randomly selected real eye sample image, the network loss of the first generation network can be calculated by adopting a mean square error function, network parameters of the first generation network are adjusted by utilizing the network loss, and when the network loss meets a convergence condition, a sight line correction network is obtained.

In some possible embodiments, if the real eye sample image and the simulated eye sample image are used as training data together, when the first generation network is trained, the first generation network may be trained by using the simulated eye sample image and the supervision data to obtain a trained first generation network, the trained first generation network has a basic line-of-sight correction capability, and in order to improve the line-of-sight correction capability of the network, the trained first generation network may be retrained again by using the real eye sample image and the supervision data to obtain the line-of-sight correction network, and the line-of-sight correction capability and the correction accuracy of the line-of-sight correction network may be improved by two stages of training.

In some possible embodiments, if the real eye sample image and the simulated eye sample image are used together as training data, the simulated eye sample image and the real eye sample image included in the training data may also be used in the same training phase without being divided into two training phases when training the first generation network.

304. An eye image to be processed is acquired, the eye image including an image of a first eye region and an image of a second eye region.

The specific implementation manner of step 304 may refer to the related description of step 201 in the foregoing embodiment, and is not described herein again.

305. And splicing the image of the first eye region and the image of the second eye region through the sight line correction network to obtain spliced image data, and processing the spliced image data to obtain a target correction image.

Specifically, the computer device may perform stitching processing on the image of the first eye region and the image of the second eye region through the gaze correction network by means of Concat operation to obtain stitched image data, and process the stitched image data to obtain the target correction image.

In some possible embodiments, the computer device performs a stitching process on the image of the first eye region and the image of the second eye region through the line-of-sight correction network to obtain stitched image data, which may specifically include: and splicing the image of the first eye region and the image of the second eye region based on the Concat operation by calling the sight line correction network to obtain spliced image data.

The sight line correction network comprises an Encoder stage and a Decoder stage, wherein in the Encoder stage, the resolution of an image is continuously reduced, and the Encoder stage is a process in which details are continuously weakened and feature abstraction is gradually enhanced; in the Decoder stage, the resolution of the image is continuously increased until the resolution of the input image is increased, and the detail is continuously recovered.

306. And filling the first eye region and the second eye region by using the target correction image to obtain a corrected eye image.

307. And sending the corrected eye image to a target terminal device so that the target terminal device displays the corrected eye image through a client.

Specifically, the first eye region and the second eye region can be respectively filled with the target corrected image to obtain a corrected eye image, and then the corrected eye image is sent to the target terminal device, and the target terminal device displays the corrected eye image through the client. For example, a client on a first terminal device acquires an environment image including a target user by calling a camera, a computer device may acquire an eye image of the target user in the environment image by using the image processing method provided by the embodiment of the application, processing the image of the first eye region and the image of the second eye region in the eye image by calling a sight line correction network to obtain a target correction image, filling the first eye region and the second eye region in the environment image by using the target correction image to obtain a corrected eye image, the corrected eye images can be sent to the target terminal device, and the eye images, which are obtained after the sight line of the target user is corrected, are seen by the user of the target terminal device, so that the reality and the immersive experience of the user in the video session scene can be improved.

In the embodiment of the application, a data migration network can be obtained by training virtual eye sample images of a source domain and real eye sample images of a target domain, the data migration network is used for converting a large number of virtual eye sample images of the source domain into simulated eye sample images available for the target domain, a sight line correction network is obtained by training based on the simulated eye sample images and the real eye sample images, the eye images to be processed are processed by the sight line correction network to obtain target correction images, then the first eye region and the second eye region in the eye images are filled by the target correction images to obtain corrected eye images, the corrected eye images can be sent to a target terminal device, the target terminal device displays the corrected eye images through a client, and through binocular constraint and data field migration, the eye sight line can be conveniently and accurately corrected, the correction effect is real and natural, and the eye sight line correction method can be used for improving the reality and immersive experience of user participation in a video session scene.

In some possible embodiments, as shown in fig. 4c, the overall flow chart of the eye vision correction provided by the embodiment of the present application is shown. The method specifically comprises the following steps: the method comprises the steps of obtaining a face image, cutting the face image to obtain images (including a left eye image and a right eye image) of a double-eye area, enabling the left eye image and the right eye image to correspond to the images of the first eye area and the images of the second eye area, inputting the images of the double-eye area into a sight line correction network, obtaining a characteristic vector of the left eye image and a characteristic vector of the right eye image through a characteristic extraction module, splicing the characteristic vectors of the left eye image and the right eye image, sequentially coding and decoding spliced data, outputting the corrected left eye image and the corrected right eye image (namely the target correction image) through an output layer, respectively pasting the corrected left eye image and the corrected right eye image back to the cut face image, finally obtaining the face image with accurately corrected sight lines, and enabling the correction effect to be real and natural.

In some possible embodiments, as shown in fig. 4d, it is a schematic overall flow chart of another eye vision correction provided by the embodiments of the present application. The method specifically comprises the following steps:

4011. inputting an original face image: this step may be implemented using a face detection technique, for example, performing face detection on an environment image including the user to determine a face image thereof.

4012. Detecting key points of the human face: the face image obtained in step 4011 is used as an input, and the key point information of the face is obtained by using a key point detection technology, for example, a key point detection algorithm based on a basic framework such as asm (active Shape models), aam (active application models), and the like may be used.

4013. Cutting out local areas of left and right eyes: according to the key point information obtained in step 4012, local regions of the left and right eyes, that is, regions where both eyes of the user are located in the face image, can be obtained by clipping, including the first eye region and the second eye region.

4014. Splicing the left eye and the right eye, and sending the spliced left eye and the right eye into a trained sight line correction network: and splicing the local areas of the left eye and the right eye obtained in the step 4013, and sending the spliced local areas to the sight line correction network.

4015. Outputting left and right eyes and respectively pasting back corresponding areas of the input image: the corrected left and right eye partial regions (i.e., the above-mentioned target corrected images) are attached back to the original face image.

4016. Outputting a face image after sight correction: and specifically, the information can be sent to the client for display.

In some possible embodiments, as shown in fig. 4e, the overall flow chart of network training provided in the embodiments of the present application is shown. The method specifically comprises the following steps:

4021. a small amount of real binocular data (i.e., the real eye sample image described above) is acquired: to improve the realism of the resulting eyes, data in real scenes, commonly referred to as target domain data, is used in the training of the network. Because the collection of real data has certain difficulty and higher cost, the scheme adopts a small amount of real target domain data to assist style migration (namely data migration).

4022. The computer synthesizes a large amount of virtual binocular data (i.e., the virtual eye sample images described above): computers can efficiently generate large amounts of virtual binocular data, which is commonly referred to as source domain data.

4023. Training a style migration network (i.e., the data migration network described above): the target domain data and the source domain data are used as unpaired data to train the style migration network, for example, the source domain data can be used as training data, and the target domain data can be used as supervision data.

4024. Data migration and conversion: the style migration network obtained in step 4023 is used to convert a large amount of virtual binocular data to obtain a large amount of simulated real binocular data (i.e., the simulated eye sample image).

4025. Training a sight correction network: and using the obtained simulated real binocular data to train a binocular vision correction network.

By the network training mode, a large number of training data sources can be provided for training of the sight correction network, the acquisition cost of the training data sources is low, and the sight correction network with high correction accuracy and real and natural correction effect can be trained.

Fig. 5 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application. The device comprises:

an obtaining module 501, configured to obtain an eye image to be processed, where the eye image includes an image of a first eye region and an image of a second eye region.

A processing module 502, configured to process the eye image through a line-of-sight correction network to obtain a target correction image, where the line-of-sight correction network is obtained by training based on a simulated eye sample image and a real eye sample image of a target domain, and the simulated eye sample image is obtained by performing migration learning on a virtual eye sample image of a source domain.

The processing module 502 is further configured to perform filling processing on the first eye region and the second eye region by using the target corrected image, so as to obtain a corrected eye image.

Optionally, the processing module 502 is specifically configured to:

and filling the first eye region by using the first eye correction image, and filling the second eye region by using the second eye correction image to obtain a corrected eye image.

Optionally, the processing module 502 is specifically configured to:

and splicing the image of the first eye region and the image of the second eye region through a sight line correction network to obtain spliced image data.

And processing the spliced image data through the sight line correction network to obtain a target correction image.

Optionally, the obtaining module 501 is further configured to obtain a virtual eye sample image of a source domain and a real eye sample image of a target domain.

The processing module 502 is further configured to process the virtual eye sample image through a data migration network to obtain a simulated eye sample image.

The processing module 502 is further configured to train a first generation network by using the real eye sample image and the simulated eye sample image, so as to obtain a line-of-sight correction network.

Optionally, the processing module 502 is further configured to train a second generation network by using the virtual eye sample image and the real eye sample image, so as to obtain a data migration network.

Optionally, the processing module 502 is specifically configured to:

and inputting the virtual eye sample image into a second generation network to obtain a first predicted eye image.

Determining a network loss of the second generation network from the first predicted eye image and the real eye sample image.

And adjusting the network parameters of the second generation network by using the network loss to obtain the data migration network.

Optionally, the processing module 502 is specifically configured to:

determining training data and supervision data of a first generation network according to the real eye sample image and the simulated eye sample image, wherein the training data comprises one or two of the real eye sample image and the simulated eye sample image, and the supervision data comprises the real eye sample image.

And training the first generation network by using the training data and the supervision data to obtain the sight line correction network.

Optionally, the processing module 502 is specifically configured to:

and inputting the training data into a first generation network to obtain a second predicted eye image.

Determining a network loss of the first generated network from the second predicted eye image and the supervision data.

And adjusting the network parameters of the first generation network by using the network loss to obtain the sight line correction network.

Optionally, the training data includes the real eye sample image and the simulated eye sample image, and the processing module 502 is specifically configured to:

and training the first generation network by using the simulated eye sample image and the supervision data to obtain the trained first generation network.

And training the trained first generation network by using the real eye sample image and the supervision data to obtain a sight line correction network.

Optionally, the obtaining module 501 is specifically configured to:

and acquiring a face image of the target user.

And determining a first eye region and a second eye region in the face image by performing key point detection on the face image.

And taking the image of the first eye region and the image of the second eye region in the face image as eye images to be processed.

Optionally, the obtaining module 501 is specifically configured to:

the method comprises the steps of obtaining an environment image including a target user, wherein the environment image is obtained by calling a shooting device through a client.

And carrying out face detection on the environment image to obtain a face image in the environment image.

Optionally, the apparatus further comprises:

a sending module 503, configured to send the corrected eye image to a target terminal device, so that the target terminal device displays the corrected eye image through a client.

It should be noted that the functions of the functional modules of the image processing apparatus in the embodiment of the present application may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.

Fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device according to the embodiment of the present application includes a power supply module and the like, and includes a processor 601, a storage device 602, and a network interface 603. The processor 601, the storage device 602, and the network interface 603 may exchange data therebetween.

The storage device 602 may include a volatile memory (volatile memory), such as a random-access memory (RAM); the storage device 602 may also include a non-volatile memory (non-volatile memory), such as a flash memory (flash memory), a solid-state drive (SSD), etc.; the storage means 602 may also comprise a combination of memories of the kind described above.

The processor 601 may be a Central Processing Unit (CPU) 601. In one embodiment, the processor 601 may also be a Graphics Processing Unit (GPU) 601. The processor 601 may also be a combination of a CPU and a GPU. In one embodiment, the storage device 602 is used to store program instructions. The processor 601 may call the program instructions to perform the following operations:

Optionally, the processor 601 is specifically configured to:

Optionally, the processor 601 is further configured to:

a virtual eye sample image of a source domain and a real eye sample image of a target domain are obtained.

And processing the virtual eye sample image through a data migration network to obtain a simulated eye sample image.

And training a first generation network by using the real eye sample image and the simulated eye sample image to obtain a sight line correction network.

Optionally, the processor 601 is further configured to train a second generation network by using the virtual eye sample image and the real eye sample image, so as to obtain a data migration network.

Optionally, the processor 601 is specifically configured to:

Optionally, the training data includes the real eye sample image and the simulated eye sample image, and the processor 601 is specifically configured to:

Optionally, the processor 601 is specifically configured to:

and acquiring a face image of the target user.

Optionally, the processor 601 is specifically configured to:

Optionally, the processor 601 is further configured to:

and calling a network interface 603 to send the corrected eye image to a target terminal device, so that the target terminal device displays the corrected eye image through a client.

In a specific implementation, the processor 601, the storage device 602, and the network interface 603 described in this embodiment of the present application may perform the implementation described in the related embodiment of the image processing method provided in fig. 2 or fig. 3 in this embodiment of the present application, and may also perform the implementation described in the related embodiment of the image processing device provided in fig. 5 in this embodiment of the present application, which is not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, where the program includes one or more instructions that can be stored in a computer storage medium, and when executed, the program may include processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps performed in the embodiments of the methods described above.

The above disclosure is only a few examples of the present application, and certainly should not be taken as limiting the scope of the present application, which is therefore intended to cover all modifications that are within the scope of the present application and which are equivalent to the claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring an eye image to be processed, wherein the eye image comprises an image of a first eye region and an image of a second eye region;

processing the eye images through a sight line correction network to obtain target correction images, wherein the sight line correction network is obtained based on simulated eye sample images and real eye sample images of a target domain through training, and the simulated eye sample images are obtained through migration learning of virtual eye sample images of a source domain;

2. The method of claim 1, wherein the processing the eye image through a line-of-sight correction network to obtain a target corrected image comprises:

splicing the image of the first eye region and the image of the second eye region through a sight line correction network to obtain spliced image data;

3. The method according to claim 1 or 2, wherein before the processing the eye image through the line-of-sight correction network to obtain the target corrected image, the method further comprises:

acquiring a virtual eye sample image of a source domain and a real eye sample image of a target domain;

processing the virtual eye sample image through a data migration network to obtain a simulated eye sample image;

4. The method of claim 3, wherein before the processing the virtual eye sample image through the data migration network to obtain the simulated eye sample image of the target domain, the method further comprises:

and training a second generation network by using the virtual eye sample image and the real eye sample image to obtain a data migration network.

5. The method of claim 4, wherein training a second generation network using the virtual eye sample image and the real eye sample image to obtain a data migration network comprises:

inputting the virtual eye sample image into a second generation network to obtain a first predicted eye image;

determining a network loss of the second generation network from the first predicted eye image and the real eye sample image;

6. The method of claim 3, wherein training a first generation network using the real eye sample images and the simulated eye sample images to obtain a gaze correction network comprises:

determining training data and supervision data of a first generation network according to the real eye sample image and the simulated eye sample image, wherein the training data comprises one or two of the real eye sample image and the simulated eye sample image, and the supervision data comprises the real eye sample image;

7. The method of claim 6, wherein training the first generative network using the training data and the supervisory data to obtain a line-of-sight correction network comprises:

inputting the training data into a first generation network to obtain a second predicted eye image;

determining a network loss of the first generated network from the second predicted eye image and the supervision data;

8. The method of claim 6, wherein the training data comprises the real eye sample images and the simulated eye sample images, and wherein training the first generated network using the training data and the supervision data to obtain the gaze correction network comprises:

training a first generation network by using the simulated eye sample image and the supervision data to obtain a trained first generation network;

9. The method of claim 1, wherein the acquiring the eye image to be processed comprises:

acquiring a face image of a target user;

determining a first eye region and a second eye region in the face image by performing key point detection on the face image;

10. The method of claim 9, wherein the obtaining the facial image of the target user comprises:

acquiring an environment image including a target user, wherein the environment image is obtained by calling a shooting device by a client;

11. The method of claim 1, further comprising:

and sending the corrected eye image to a target terminal device so that the target terminal device displays the corrected eye image through a client.

12. An image processing apparatus, characterized in that the apparatus comprises:

the eye processing device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an eye image to be processed, and the eye image comprises an image of a first eye region and an image of a second eye region;

the processing module is used for processing the eye images through a sight line correction network to obtain target correction images, the sight line correction network is obtained based on simulated eye sample images and real eye sample images of a target domain through training, and the simulated eye sample images are obtained through migration learning of virtual eye sample images of a source domain;

13. A computer device, characterized in that the computer device comprises: a processor, a network interface and a storage device, the processor, the network interface and the storage device being connected to each other, wherein the network interface is controlled by the processor for transceiving data, the storage device is used for storing a computer program, the computer program comprises program instructions, and the processor is configured to call the program instructions for executing the image processing method according to any one of claims 1 to 11.

14. A computer-readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which are executed by a processor for performing the image processing method according to any one of claims 1 to 11.

15. A computer program product comprising computer instructions, the computer instructions being stored in a computer readable storage medium, the computer instructions being read by a processor of a computer device from the computer readable storage medium, the computer instructions being executable by the processor to cause the computer device to perform the image processing method of any one of claims 1 to 11.