CN116416656A

CN116416656A - Image processing method, device and storage medium based on under-screen image

Info

Publication number: CN116416656A
Application number: CN202111645175.4A
Authority: CN
Inventors: 周俊伟; 宋小刚; 刘小伟; 陈兵; 王国毅
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2021-12-29
Filing date: 2021-12-29
Publication date: 2023-07-11
Also published as: WO2023124237A1; WO2023124237A9

Abstract

The embodiment of the application provides an image processing method, device and storage medium based on an under-screen image, and relates to the technical field of artificial intelligence. When the image processing model is a model obtained based on image training output by the AI preprocessing model, the AI preprocessing model with better image output quality can be trained firstly, a data set for training the image processing model matched with the AI preprocessing model is constructed by utilizing the output of the AI preprocessing model, and after the data set is trained to obtain the image processing model, the output of the AI preprocessing model can better meet the input requirement of the image processing model, so that an under-screen image is input into the AI preprocessing model to obtain a processed image, and then the processed image is input into the image processing model, so that a better processing effect can be obtained. Therefore, the terminal equipment can realize better image processing under the condition that the screen is not perforated, and the flexibility of the appearance design of the terminal equipment is increased.

Description

Image processing method, device and storage medium based on under-screen image

Technical Field

The present disclosure relates to the field of artificial intelligence (artificial intelligence, AI) technologies, and in particular, to an image processing method, apparatus, and storage medium based on an under-screen image.

Background

With the development of terminal technology, the functions of terminal devices are more and more diversified. For example, the terminal device may have functions of face unlocking, face payment, gesture unlocking, gesture payment, and the like. When the terminal device realizes the above-described function, the terminal device generally needs to take an image and realize the above-described function through an AI model for processing the image.

When the AI model implements the above function, it is generally necessary to input a better quality image, for example, a better definition image, a better brightness image, or the like, into the AI model. This is because, when the AI model processes an image, face recognition, gesture recognition, and the like are generally required to be implemented based on feature information of the image, and the feature information is missing or not obvious in a low-quality image, so that the recognition accuracy of the AI model is affected, and further, the terminal device has low accuracy and poor effect when implementing the functions.

Therefore, in a typical implementation, it is necessary to open a hole in the screen of the terminal device so that the camera sensor is not blocked by the screen when receiving the optical signal, thereby obtaining a relatively good quality image. However, the way of opening holes in the screen of the terminal device limits the flexibility of the design of the external shape of the terminal device and may also affect the visual perception of a user who partially prefers the complete screen.

Disclosure of Invention

The embodiment of the application provides an image processing method, an image processing device and a storage medium based on an under-screen image, which can obtain a better image recognition effect based on an image shot by a camera arranged below a screen, so that the method and the device can be applied to terminal equipment with no opening on the screen, and the screen of the terminal equipment is not limited in design due to the realization of an image recognition function.

In a first aspect, an embodiment of the present application provides an image processing method based on an under-screen image, where the method includes: an under-screen image is acquired. And inputting the under-screen image into a pre-trained artificial intelligence AI preprocessing model to obtain a processed image. And inputting the processed image into an image processing model to obtain a processing result. When the image processing model is a model obtained by training an image based on output of the AI preprocessing model, the AI preprocessing model is a model obtained by training by using a first training data set, the first training data set comprises first test data and first sample data, the test image in the first test data corresponds to a sample image in the first sample data, and the image quality of the test image is inferior to that of the sample image. The image processing model is obtained by inputting a second training data set into the AI pretreatment model and training by using the output of the AI pretreatment model, and the second training data set comprises a related data set for realizing the function required by the image processing model.

In the embodiment of the application, when the image processing model is a model obtained based on image training output by the AI preprocessing model, the AI preprocessing model with better image output quality can be trained firstly, then, a data set for training the image processing model matched with the AI preprocessing model is constructed by utilizing the output of the AI preprocessing model, and after the data set is trained to obtain the image processing model, the output of the AI preprocessing model can better meet the input requirement of the image processing model, so that the under-screen image is input into the AI preprocessing model, and after the processed image is obtained, the processed image is input into the image processing model trained in advance in the embodiment of the application, so that a better processing effect can be obtained.

In one possible implementation, the under-screen image is an image captured based on a camera disposed under the screen. Therefore, the terminal equipment can realize better image processing under the condition that the screen is not perforated, and the flexibility of the appearance design of the terminal equipment is increased.

In a possible implementation manner, the test image is an image obtained by shooting through a camera arranged under the screen, or the test image is an image obtained by performing degradation processing on sample data. Therefore, when the test image is an image obtained by shooting through a camera arranged under the screen, the similarity between the test image and the image shot by the actual terminal equipment is higher, and the training is facilitated to obtain a model with better recognition effect. In the implementation that the test image is an image obtained by performing degradation processing on sample data, the test image does not need to be acquired by using specific equipment, and a large number of test images can be obtained by using sample data with better quality more easily.

In one possible implementation, the camera comprises a time of flight TOF camera.

In one possible implementation, the degradation process includes one or more of the following: increase newton rings, increase diffraction spots, decrease grey scale values, or increase picture blurring. Thus, an on-screen image similar to that obtained by truly using an on-screen camera can be obtained through degradation processing, and a model with a good recognition effect can be obtained through training.

In one possible implementation, the image processing model includes one or more of the following: face recognition model, open eye closing model, eye gazing model or face anti-fake model. Therefore, when the functions of face unlocking and the like are realized by combining the AI preprocessing model with the image processing model, the image processing model can obtain better output precision. It is understood that the image processing model may also be referred to as a face unlocking related model when the face unlocking model is applied.

In a possible implementation manner, when the AI pretreatment model is a model obtained by training a joint image processing model, the parameters of the image processing model are not adjustable when the AI pretreatment model is trained, the parameters of the AI pretreatment model are adjustable, and the AI pretreatment model completes training when a value obtained by calculation by using a target loss function converges. Wherein the target loss function is related to the loss function of the AI pre-processing model and the loss function of the image processing model. Therefore, when the AI pretreatment model with better image output quality and poor quality is used for training, the image treatment model which is needed to be used together with the AI pretreatment model is combined, the output of the image treatment model is used as a feedback factor for training the AI pretreatment model, and the output of the trained AI pretreatment model can better meet the requirement of the image treatment model, so that the under-screen image is input into the AI pretreatment model trained in advance in the embodiment of the application, the processed image is input into the image treatment model after being obtained, and a better treatment effect can be obtained.

In one possible implementation manner, the number of the image processing models is a plurality, and when the target loss function is calculated, the weight difference between the loss function of the AI pre-processing model and any two of the loss functions of any one of the image processing models is smaller than a preset value. Therefore, the weights occupied by the image processing models when the AI pretreatment models are trained are similar, and the AI pretreatment models can be used in good combination with the image processing models.

In a possible implementation manner, when the image processing model includes a face recognition model, an open eye closing model, an eye gazing model and a face anti-counterfeiting model, the objective loss function satisfies the following formula:

L _total ＝αL _c +βL _F +γL _G +θL _E +τL _R

wherein the loss function of the AI pretreatment model is L _c The loss function of the face recognition model is L _F The loss function of the human eye gazing recognition model is L _G The loss function of the open-close eye recognition model is L _E The loss function of the human face anti-counterfeiting model is L _R Alpha, beta, gamma, theta and tau are all preset constants. Therefore, when the functions of face unlocking and the like are realized by combining the AI preprocessing model with the image processing model, the image processing model can obtain better output precision. The embodiment of the application can reuse the conventional existing image processing model, and the subsequent joint training or use with the AI pretreatment model does not need to carry out the parameter on the image processing model The number is adjusted. In this way, the number of models that need to be trained can be reduced.

In one possible implementation, the number and variety of image processing models is configurable. In this way, the terminal device can flexibly configure the number and kind of image processing models based on the environment recognition or the user setting to meet diversified demands.

In a possible implementation manner, the method further includes: and displaying a first interface, wherein the first interface comprises identifications of various face unlocking modes, and each identification corresponds to a control. When the trigger of the target control corresponding to the identification of the target face unlocking mode in the plurality of face unlocking modes is received, setting the image processing model as a model corresponding to the target face unlocking mode. Therefore, the user can flexibly select a desired face unlocking mode, and the user requirements are better met.

In one possible implementation, the plurality of face unlocking modes includes a plurality of: standard mode, mask mode, strict mode, or custom mode. The image processing model corresponding to the standard mode comprises a face recognition model and a face anti-counterfeiting model. The image processing model corresponding to the mask mode comprises an open-close eye recognition model, a human eye gazing recognition model and a human face anti-counterfeiting model. The image processing model corresponding to the strict mode comprises a face recognition model, a open eye recognition model, a human eye gazing recognition model and a face anti-counterfeiting model. The image processing model corresponding to the custom mode comprises one or more of a face recognition model, a open eye recognition model, a human eye gazing recognition model or a face anti-counterfeiting model. In this way, the user can select a proper mode to unlock the face according to the environment in which the user is located.

In a second aspect, an embodiment of the present application provides an image processing apparatus, where the image processing apparatus may be a terminal device, or may be a chip or a chip system in the terminal device. The image processing apparatus may include a display unit and a processing unit. When the image processing apparatus is a terminal device, the display unit may be a display screen. The display unit is configured to perform the step of displaying, so that the terminal device implements the display-related method described in the first aspect or any one of the possible implementation manners of the first aspect, and the processing unit is configured to implement the processing-related method in the first aspect or any one of the possible implementation manners of the first aspect. When the image processing apparatus is a terminal device, the processing unit may be a processor. The image processing apparatus may further include a storage unit, which may be a memory. The storage unit is configured to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the terminal device implements a method described in the first aspect or any one of possible implementation manners of the first aspect. When the image processing apparatus is a chip or a chip system in a terminal device, the processing unit may be a processor. The processing unit executes instructions stored by the storage unit to cause the terminal device to implement a method as described in the first aspect or any one of the possible implementations of the first aspect. The memory unit may be a memory unit (e.g., a register, a cache, etc.) in the chip, or a memory unit (e.g., a read-only memory, a random access memory, etc.) located outside the chip in the terminal device.

The processing unit is used for acquiring an under-screen image, inputting the under-screen image into the pre-trained artificial intelligence AI preprocessing model to obtain a processed image, and inputting the processed image into the image processing model to obtain a processing result. When the image processing model is a model obtained by training an image based on output of the AI preprocessing model, the AI preprocessing model is a model obtained by training by using a first training data set, the first training data set comprises first test data and first sample data, the test image in the first test data corresponds to a sample image in the first sample data, and the image quality of the test image is inferior to that of the sample image. The image processing model is obtained by inputting a second training data set into the AI pretreatment model and training by using the output of the AI pretreatment model, and the second training data set comprises a related data set for realizing the function required by the image processing model.

In one possible implementation, the under-screen image is an image captured based on a camera disposed under the screen. The test image is an image photographed by a camera provided under the screen, or an image obtained by performing degradation processing on sample data.

In one possible implementation, the camera comprises a time-of-flight TOF camera, and the degradation process comprises one or more of: increase newton rings, increase diffraction spots, decrease grey scale values, or increase picture blurring.

In a possible implementation manner, when the AI pretreatment model is a model obtained by training a joint image processing model, the parameters of the image processing model are not adjustable when the AI pretreatment model is trained, the parameters of the AI pretreatment model are adjustable, and the AI pretreatment model completes training when a value obtained by calculation by using a target loss function converges. Wherein the target loss function is related to the loss function of the AI pre-processing model and the loss function of the image processing model.

In one possible implementation manner, the number of the image processing models is a plurality, and when the target loss function is calculated, the weight difference between the loss function of the AI pre-processing model and any two of the loss functions of any one of the image processing models is smaller than a preset value.

L _total ＝αL _c +βL _F +γL _G +θL _E +τL _R

wherein the loss function of the AI pretreatment model is L _c The loss function of the face recognition model is L _F The loss function of the human eye gazing recognition model is L _G The loss function of the open-close eye recognition model is L _E The loss function of the human face anti-counterfeiting model is L _R Alpha, beta, gamma, theta and tau are all preset constants.

In one possible implementation, the number and variety of image processing models is configurable.

In a possible implementation manner, the display unit is configured to display a first interface, where the first interface includes identifiers of multiple face unlocking modes, and each identifier corresponds to a control. When the display unit receives the trigger of the target control corresponding to the identification of the target face unlocking mode in the plurality of face unlocking modes, the processing unit is used for setting the image processing model as the model corresponding to the target face unlocking mode.

In one possible implementation, the plurality of face unlocking modes includes a plurality of: standard mode, mask mode, strict mode, or custom mode. The image processing model corresponding to the standard mode comprises a face recognition model and a face anti-counterfeiting model. The image processing model corresponding to the mask mode comprises an open-close eye recognition model, a human eye gazing recognition model and a human face anti-counterfeiting model. The image processing model corresponding to the strict mode comprises a face recognition model, a open eye recognition model, a human eye gazing recognition model and a face anti-counterfeiting model. The image processing model corresponding to the custom mode comprises one or more of a face recognition model, a open eye recognition model, a human eye gazing recognition model or a face anti-counterfeiting model.

In a third aspect, embodiments of the present application provide an electronic device comprising a processor and a memory, the memory being for storing code instructions, the processor being for executing the code instructions to perform the method described in the first aspect or any one of the possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program or instructions are stored which, when run on a computer, cause the computer to perform the method for processing an image based on an off-screen image described in the first aspect or any one of the possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the method of image processing based on an off-screen image described in the first aspect or any one of the possible implementations of the first aspect.

In a sixth aspect, the present application provides a chip or chip system comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by wires, the at least one processor being adapted to execute a computer program or instructions to perform the method of image processing based on an under-screen image described in the first aspect or any one of the possible implementations of the first aspect. The communication interface in the chip can be an input/output interface, a pin, a circuit or the like.

In one possible implementation, the chip or chip system described above in the present application further includes at least one memory, where the at least one memory has instructions stored therein. The memory may be a memory unit within the chip, such as a register, a cache, etc., or may be a memory unit of the chip (e.g., a read-only memory, a random access memory, etc.).

It should be understood that, the second aspect to the sixth aspect of the present application correspond to the technical solutions of the first aspect of the present application, and the beneficial effects obtained by each aspect and the corresponding possible embodiments are similar, and are not repeated.

Drawings

FIG. 1 is a schematic view of a scene to which the embodiments of the present application are applied;

fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 3 is a schematic software architecture of an electronic device according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of model training according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart of model training according to an embodiment of the present disclosure;

FIG. 6 is a schematic flow chart of model training according to an embodiment of the present application;

fig. 7 is a schematic flow chart of an image processing method based on an on-screen image according to an embodiment of the present application;

fig. 8 is a schematic diagram of a terminal device interface provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

In order to facilitate the clear description of the technical solutions of the embodiments of the present application, the following simply describes some terms and techniques related to the embodiments of the present application:

1) Under screen image: it is understood that the resulting image is taken with the camera overlaid under the screen. The camera may include a camera for capturing color images or a time of flight (TOF) camera, or the like. The terminal device may take an original image raw image by using a TOF camera, analyze the raw image, and obtain an Infrared (IR) image, a three-dimensional image with depth, and the like.

2) AI model: it can be understood that a model for realizing a certain function is trained based on AI technology. For example, the relevant AI models in embodiments of the present application may include one or more of the following: AI preprocessing model, face detection model, face recognition model, open-close eye recognition model, human eye gazing recognition model, living body detection model, three-dimensional anti-fake model, gesture recognition model, expression recognition model, etc.

The AI preprocessing model is used for processing the under-screen image into an image with better quality. The face detection model is used for detecting the position of a face in an image. The face recognition model is used for recognizing Identity (ID) corresponding to a face, and the ID may be identity information, name, authority, etc. corresponding to the face. The open eye recognition model is used to recognize whether the human eye in the image is in an open state or a closed state. The human eye gaze recognition model is used for recognizing whether the human eye gazes at the terminal device. The living body detection model is used for detecting whether the camera is in front of a living body. The three-dimensional anti-counterfeiting model is used for identifying whether three-dimensional attack exists. The gesture recognition model is used to recognize gesture categories including, for example, a praise gesture, an OK gesture, or a fist-making gesture, etc. The expression recognition model is used to recognize expression categories including, for example, happiness, surprise, sadness, anger, aversion, fear, or the like.

It will be appreciated that the models may be independent or may be combined with each other, for example, the face detection model may be combined with the face recognition model to implement face detection and recognition, etc.

Other naming manners of the models are also possible, for example, the models are named as a first model, a second model, an nth model, a target model, a neural network model, etc., and the embodiment of the present application uses the above naming as a model name example only, and the specific meaning of the model can refer to the specific function of the model.

3) Face unlocking related model: one or more models that can be used for face unlocking. For example, the face unlocking-related model may include one or more of the following: face recognition model, open and close eye recognition model, human eye gazing recognition model and human face anti-fake model. Several possible ways of training and using the face unlocking correlation model will be described in detail below.

Face recognition model: during training, face data sets with different identities can be constructed, and the neural network model is trained based on the face data sets and a loss function with higher distinction degree for face feature vectors, so that a face recognition model is obtained.

The loss function with higher distinction of the face feature vector may include a cross entropy loss function or a variant of the cross entropy loss function, and the like. For example, cross entropy loss function L _c The following formula may be satisfied:

n is the number of pictures of the face dataset (or called the current batch); y is _ic The value may be 0 or 1 as a sign function, for example, if the true class of sample i equals c to 1,otherwise, taking 0; p is p _ic The prediction probability of the picture i belonging to the category c is the prediction value of the training stage model.

After training is completed, when the face recognition model is used, different processes can be executed based on different scenes. For example, when the face recognition model is used for identity recognition, the face recognition model is set in the electronic device, the electronic device can firstly input a face template diagram of a person, the face template diagram comprises five angles corresponding to the five angles, namely, up, down, left, right and right, then the electronic device extracts face feature vectors by using the trained face recognition model, and finally, the five feature vectors are averaged and stored in a template library. And then, extracting the face feature vector of the face image of the unknown identity to be tested by utilizing a face recognition model, calculating the similarity with the face feature vector in the template library, and assigning the identity corresponding to the feature vector with the similarity larger than the preset threshold value of the face feature vector of the face image of the unknown identity in the template library to the test image to obtain an identity recognition result. For example, the predetermined threshold may be any value between 0.5 and 1.

Open eye recognition model: during training, a face data set marked with open eyes or closed eyes can be constructed, the face data set is input into a model to be trained, the model to be trained can firstly identify face key points according to a face key point detection network, then a human eye subgraph is cut out according to the key point coordinates, further the confidence degree of the predicted open eyes or closed eyes is output based on the human eye subgraph, and when the predicted value of the model to be trained and the corresponding marked reference value are converged based on the numerical value calculated by the loss function, a trained open eye identification model can be obtained.

The model structure of the model to be trained can be a classification model based on a convolutional neural network, and the pictures corresponding to the left eye and the right eye can also have independent reference results during training. The loss function may use a two-class cross entropy loss function or the like.

After training, when the open-eye and closed-eye recognition model is used, the open-eye and closed-eye recognition model is arranged in the electronic equipment, and the electronic equipment inputs the face picture into the open-eye and closed-eye recognition model, so that open-eye or closed-eye output can be obtained.

Human eye gaze recognition model: during training, a face data set marked with gazing or non-gazing can be constructed, the face data set is input into a to-be-trained model, the to-be-trained model can firstly identify face key points according to a face key point detection network, then a left eye picture small block and a right eye picture small block are cut out according to the key point coordinates, further the predicted gazing or non-gazing confidence coefficient is output based on the left eye picture small block and the right eye picture small block, and when the predicted value of the to-be-trained model converges with a value calculated by a corresponding marking reference value based on a loss function, a trained human eye gazing recognition model can be obtained.

The model to be trained can be a convolutional neural network model, and the model can adopt a convolutional neural network structure, and comprises a convolutional layer and a full-connection layer. The loss function may use a two-class cross entropy loss function or the like.

After training, when the human eye gazing recognition model is used, the human eye gazing recognition model is arranged in the electronic equipment, and the electronic equipment inputs the human face picture into the human eye gazing recognition model, so that gazing or non-gazing output can be obtained.

Face anti-counterfeiting model: a two-dimensional anti-counterfeit model (or referred to as a living body detection model) or a three-dimensional anti-counterfeit model may be included for judging as a real face or a false face. The model structure of the face anti-counterfeiting model can comprise a classification model based on a convolutional neural network, an IR image and/or a depth image are used as the input of the face anti-counterfeiting model, and the face anti-counterfeiting model can judge whether the face is a real face or a false face. The network loss function of the face anti-counterfeiting model can also use a two-class cross entropy loss function.

4) Other terms

In the embodiments of the present application, the words "first," "second," and the like are used to distinguish between identical or similar items that have substantially the same function and effect. For example, the first chip and the second chip are merely for distinguishing different chips, and the order of the different chips is not limited. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

The face recognition or gesture recognition can be applied to the fields of terminal unlocking, security protection or electronic payment and the like, so that when a user uses the terminal equipment, the user can conveniently unlock, pass or pay the permission based on the face or the gesture, and the privacy security of the user when the user uses the terminal equipment can be improved.

Whether face recognition or gesture recognition is performed, the terminal equipment needs to take images by using a camera, and corresponding recognition is realized based on the images. The image may be a two-dimensional color image, an IR image, a three-dimensional image, or the like.

Taking face unlocking for a terminal device as an example, fig. 1 shows a face unlocking scene. As shown in a of fig. 1, when the face of the user faces the terminal device, the terminal device may obtain an image based on the capturing of the TOF camera, and further the terminal device determines whether the unlocking condition is currently met based on the face unlocking related model, for example, if the face in the image matches with a preset face, the terminal device may determine that the unlocking condition is met, implement unlocking, and enter a main interface shown in b of fig. 1.

If the camera of the terminal device is blocked by the display screen, the image shot by the camera may have serious quality degradation, and factors of the quality degradation include, for example, some or more of the following: image blurring, newton rings, diffraction spots, low brightness, low gray values, etc. Partial characteristic information can be lost in the low-quality image, and the accuracy of the subsequent face unlocking related model can be affected by inputting the low-quality image into the face unlocking related model, so that the face unlocking success rate is reduced.

Therefore, in one possible implementation, the display screen in the terminal device leaves a small hole at the camera, so that shielding of the display screen to the camera is avoided, and an image with better quality is obtained through the camera. However, the mode of opening small holes on the display screen can limit the flexibility of the appearance design of the mobile phone and damage the integrity of the display screen.

In order to maintain the integrity of the display screen, in another possible implementation, the display screen in the terminal device may cover the camera, after the terminal device obtains a low-quality image by using the camera covered under the display screen, the terminal device performs quality improvement processing on the low-quality image, and then inputs the image after quality improvement into a face unlocking related model.

However, in the implementation, the situation that the image after the quality improvement processing cannot be matched with the face unlocking related model exists, or the situation that the identification effect of the image with the quality improved after the processing in the face unlocking related model is still quite different from the situation that the screen shielding is not available still exists, so that the unlocking rate of the face unlocking is still low.

In view of this, the embodiments of the present application provide an image processing method based on an on-screen image, where the method may be trained in conjunction with an image processing model when training an AI preprocessing model for improving image quality, so that an image processed by the AI preprocessing model may obtain a better processing effect in the image processing model.

The image processing model may be any image processing related model, for example, the image processing model may include a gesture recognition model, an expression recognition model, a face unlocking related model, a face payment related model, or the like, which is not specifically limited in the embodiment of the present application. It may be appreciated that, for convenience of description, the embodiment of the present application will be described by taking the image processing model as an example of a face unlocking related model, and the example is not limited to the image processing model specifically.

When the image processing model is a face unlocking related model, the AI preprocessing model can be trained in combination with the face unlocking related model, so that the image processed by the AI preprocessing model can obtain better recognition accuracy in the face unlocking related model. After the terminal equipment obtains the under-screen image, the under-screen image can be input into an AI pretreatment model, the quality of the under-screen image is improved through the AI pretreatment model, and then the image with the improved quality is input into a face unlocking related model, so that the face unlocking accuracy is realized.

It can be understood that, because the AI pretreatment model in the embodiment of the application combines the face unlocking related model during training, the image quality output by the AI pretreatment model can meet the requirement of the face unlocking related model, and a better recognition effect is easy to obtain.

It should be noted that, the embodiments of the present application may include a training phase of the AI pretreatment model and a use phase of the AI pretreatment model. The training phase may be implemented by an electronic device with a relatively high computing power, and a specific training process will be described in detail in the following embodiments, which are not described herein. In the using stage, the AI pretreatment model can be deployed in terminal equipment needing to use the AI pretreatment model, so that the AI pretreatment model is used for processing the under-screen image, and a better image recognition effect is realized.

The terminal device in the embodiment of the present application may also be any form of electronic device, for example, the electronic device may include a handheld device with an image processing function, an in-vehicle device, and the like. For example, some electronic devices are: a mobile phone, tablet, palm, notebook, mobile internet device (mobile internet device, MID), wearable device, virtual Reality (VR) device, augmented reality (augmented reality, AR) device, wireless terminal in industrial control (industrial control), wireless terminal in unmanned (self driving), wireless terminal in teleoperation (remote medical surgery), wireless terminal in smart grid (smart grid), wireless terminal in transportation security (transportation safety), wireless terminal in smart city (smart city), wireless terminal in smart home (smart home), cellular phone, cordless phone, session initiation protocol (session initiation protocol, SIP) phone, wireless local loop (wireless local loop, WLL) station, personal digital assistant (personal digital assistant, PDA), handheld device with wireless communication function, public computing device or other processing device connected to wireless modem, vehicle-mounted device, wearable device, terminal device in 5G network or evolving land mobile terminal (public land mobile network), and the like, without limiting the examples of this.

By way of example, and not limitation, in embodiments of the present application, the electronic device may also be a wearable device. The wearable device can also be called as a wearable intelligent device, and is a generic name for intelligently designing daily wear by applying wearable technology and developing wearable devices, such as glasses, gloves, watches, clothes, shoes and the like. The wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also can realize a powerful function through software support, data interaction and cloud interaction. The generalized wearable intelligent device includes full functionality, large size, and may not rely on the smart phone to implement complete or partial functionality, such as: smart watches or smart glasses, etc., and focus on only certain types of application functions, and need to be used in combination with other devices, such as smart phones, for example, various smart bracelets, smart jewelry, etc. for physical sign monitoring.

In addition, in the embodiment of the application, the electronic device may also be a terminal device in an internet of things (internet of things, ioT) system, and the IoT is an important component of future information technology development, and the main technical characteristic of the IoT is that the article is connected with a network through a communication technology, so that man-machine interconnection and an intelligent network for internet of things are realized.

The electronic device in the embodiment of the application may also be referred to as: a terminal device, a User Equipment (UE), a Mobile Station (MS), a Mobile Terminal (MT), an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent, a user equipment, or the like.

In an embodiment of the present application, the electronic device or each network device includes a hardware layer, an operating system layer running above the hardware layer, and an application layer running above the operating system layer. The hardware layer includes hardware such as a central processing unit (central processing unit, CPU), a memory management unit (memory management unit, MMU), and a memory (also referred to as a main memory). The operating system may be any one or more computer operating systems that implement business processes through processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system. The application layer comprises applications such as a browser, an address book, word processing software, instant messaging software and the like.

By way of example, fig. 2 shows a schematic structural diagram of the electronic device 100.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present invention is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrixorganic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot lightemitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, so that the electrical signal is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1. The plurality of cameras 193 may be different in kind, for example, the cameras 193 may include a camera for acquiring a color image or a TOF camera or the like.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer-executable program code that includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture, among others. In this embodiment, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

Fig. 3 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present application.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system may include: an application layer (applications), an application framework layer (application framework), a hardware abstraction layer (hardware abstract layer, HAL), and a kernel layer (kernel), which may become a driver layer.

The application layer may include a series of application packages.

As shown in fig. 3, the application package may include camera, gallery, phone, map, phone, music, settings, mailbox, video, social, etc. applications. Optionally, the application package may further include an application program for image recognition, where the application program for image recognition includes an algorithm or a model for image recognition, and the like. It is to be understood that the application program for image recognition may exist alone or may be part of any application program in the application program layer, and the embodiment of the present application is not specifically limited.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 3, the application framework layer may include a window manager, a content provider, a resource manager, a view system, a notification manager, a camera access interface, and the like.

The window manager is used for managing window programs. The window manager may obtain the display screen size, determine if there is a status bar, lock the screen, touch the screen, drag the screen, intercept the screen, etc.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the terminal equipment vibrates, and an indicator light blinks.

The camera access interface enables an application to perform camera management and access the camera device. Such as managing the camera for image capture, etc.

The hardware abstraction layer may include a plurality of library modules, which may be, for example, camera library modules, algorithm library modules, and the like. The Android system can load a corresponding library module for the equipment hardware, so that the purpose of accessing the equipment hardware by an application program framework layer is achieved. In the embodiment of the present application, an AI preprocessing model for processing an image may be included in the algorithm library, and any face unlocking related model for implementing face unlocking may be included.

The kernel layer is a layer between hardware and software. The kernel layer is used for driving the hardware so that the hardware works. The kernel layer may include a camera device driver, a display driver, an audio driver, etc., which is not limited in this embodiment of the present application. The hardware layer may include various types of sensors, shooting type sensors including, for example, TOF cameras, multispectral sensors, and the like.

For example, the camera device driver may drive a camera-like sensor in the hardware layer to perform image capturing or the like.

A possible implementation of the image processing method according to the embodiment of the present application is described below with reference to fig. 3.

In one possible implementation manner, the related algorithm model of the embodiment of the application is set in an algorithm library of a hardware abstraction layer. For example, when face unlocking is performed, a camera access interface can be called through a camera application, the camera access interface manages a camera hardware abstraction layer to obtain images through camera driving, and the obtained images are further subjected to calculation of the AI preprocessing model, face unlocking and other algorithms in the algorithm library of the hardware abstraction layer, and then the terminal equipment unlocking and other processes are executed.

In another possible implementation manner, the related algorithm model of the embodiment of the present application is set in an image processing application of an application program layer. For example, when face payment is performed, a camera access interface can be called through an image processing application, the camera access interface manages a camera hardware abstraction layer to achieve image acquisition through camera driving, and the acquired image is further subjected to calculation of an AI preprocessing model, face unlocking and other algorithms in an algorithm library of an application program layer, and then processes such as payment and the like are performed.

The method for processing an image based on an under-screen image according to the embodiment of the present application will be described in detail by way of specific embodiments. The following embodiments may be combined with each other or implemented independently, and the same or similar concepts or processes may not be described in detail in some embodiments.

When the image processing method based on the under-screen image in the embodiment of the application is executed, an AI preprocessing model for improving the quality of the under-screen image needs to be trained in advance based on a joint training mode. The combined training mode can be understood as that when the AI pretreatment model is trained, the AI pretreatment model is combined with a related model which is to be combined with the AI pretreatment model to realize functions of face unlocking and the like, so that the AI pretreatment model which can be matched with the related model of face unlocking is obtained, and when the process of face unlocking and the like is realized by combining the AI pretreatment model subsequently, the related model of face unlocking is more accurately identified.

In this embodiment of the present application, two kinds of implementation manners of joint training may be included, and two kinds of implementation manners of implementing AI pretreatment model training by using a joint training manner are illustrated below with reference to fig. 4 and 5.

In the first embodiment of the combined training AI pretreatment model shown in fig. 4, an AI pretreatment model with better image output quality can be trained first, then a data set for training a model used in cooperation with the AI pretreatment model is constructed by using the output of the AI pretreatment model, and after the model used in cooperation with the AI pretreatment model is obtained by using the data set related to the AI pretreatment model, the output of the AI pretreatment model can better meet the requirement of a model using the output subsequently, so that a better treatment effect can be obtained for the subsequent model. For example, when the AI preprocessing model is combined with the face unlocking related model to realize functions such as face unlocking, the face unlocking related model can obtain better output precision. As shown in fig. 4, the method includes:

S401: a first training data set is acquired.

In the embodiment of the application, the first training data set may be a data set for training an AI pretreatment model. The first training data set may include first test data and first sample data, e.g., the first test data may be a poor quality image and the first sample data may be a better quality image corresponding to a poor quality image. The images in the first test data may be in a one-to-one correspondence with the images in the first sample data, so that when the AI pretreatment model is trained subsequently, the images in the first sample data may be used as reference images when the model to be trained trains certain first test data.

In one possible implementation, the first test data may be an off-screen IR image obtained with a TOF camera. The first sample data may be a non-screen IR image obtained with a TOF camera or an IR image acquired when the TOF camera is not occluded by the screen.

In another possible implementation, the first sample data may be a non-screen IR image obtained by using a TOF camera, and the first test data may be an image obtained by synthesizing the first sample data, for example, performing degradation processing such as increasing a screen shielding effect on the non-screen IR image in the first sample data, so as to obtain first test data with poor quality. Wherein increasing the screen occlusion effect may include, for example, one or more of: increase newton rings, increase diffraction spots, decrease gray values, increase image blurring effects, etc.

In order to cover different situations as much as possible, so that the AI preprocessing model for subsequent training can better process images in different shooting scenes, the first training data set may include first test data and first sample data corresponding to different shooting distances (e.g., 30cm, 40cm, 50 cm), shooting angles, exposure times (e.g., 1000us, 1500 us), camera types, different shooting objects, and/or the like. When the shooting object is an object wearing glasses, the first training data set can further comprise images obtained by wearing different types of glasses by the object.

S402: and training the model to be trained based on the first training data set to obtain an AI preprocessing model.

The electronic device may be provided with a model to be trained, and the model to be trained may be any type of neural network model, for example, may include any of the following types of models: convolutional neural network (convolutional neural network, CNN), generate countermeasure network (generative adversarial network, GAN), U-convolutional neural network (U-shaped convolutional neural network, unet), transducer module, etc. Wherein the Unet may comprise an encoder and a decoder. For example, the encoder may include 3-5 convolutional layers, where the activation function may select the leak relu, without a normalization layer; the decoder can use up-sampling layer, the number of the decoder can be 1 less than that of the encoder, the activation function can select the leak relu as well, and no normalization layer exists; feature fusion can be performed between the encoder and decoder.

Initialization parameters of the model to be trained the embodiments of the present application are not particularly limited. For example, the parameters in the model to be trained may be initialized using a kaiming initialization method. For example, the super parameter settings include: when the model is trained, the batch size is 32, the learning rate is set to be 0.0002, and the optimizer selects Adam and Adam parameters beta ₁ ＝0.9，β ₂ =0.99, iterative epoch is 200.

During training, test data can be input into a model to be trained, loss calculation is carried out on a predicted image output by the model to be trained and a corresponding reference image in sample data, then parameters of the model to be trained are updated based on the result of the loss calculation, the process is repeated until the preset maximum iteration number is reached, or loss convergence is achieved, or a loss value is smaller than a certain value, model training can be considered to be finished, and an AI preprocessing model is obtained.

Illustratively, when the AI pre-processing model improves the image quality of the IR image under the screen, the effects that can be achieved include, but are not limited to, one or more of the following: the blurred image is made clear, newton rings in the image are removed, diffraction spots in the image are eliminated, and gray values are added to the image photographed at a long distance.

After the trained AI pretreatment model is obtained, the AI pretreatment model and the face unlocking related model to be trained can be respectively combined trained to obtain the face unlocking related model which can be matched with the AI pretreatment model. Specific reference is made to the descriptions of S403 and S404.

S403: a second training data set is acquired.

In the embodiment of the present application, the second training data set may be a data set for training a face unlocking related model. It can be appreciated that the face unlocking related models can be flexibly selected based on specific requirements, and the number of the face unlocking related models can be one or more.

When the number of face unlocking-related models is plural, the training steps of S403 and S404 may be performed for each model, respectively. For example, as shown in fig. 4, when the face unlocking related model includes a face recognition model, an open eye closed model, a face gazing model and a face anti-counterfeiting model, a face recognition data set, an open eye closed data set, a face gazing data set and a face anti-counterfeiting data set may be respectively constructed, the data sets are respectively input into an AI preprocessing model, and the face unlocking related models may be respectively obtained by combining the output of the AI preprocessing model.

S404: and inputting the second training data set into the AI pretreatment model, and training the output of the AI pretreatment model to obtain a face unlocking related model.

In the embodiment of the application, the second data set for training the face unlocking related model is input into the AI pretreatment model, and then the image processed by the AI pretreatment model is used as the input of the training face unlocking related model, so that the face unlocking related model obtained through training can better identify the image processed by the AI pretreatment model. In the scene of face unlocking, the terminal equipment can be supported to obtain a better unlocking rate based on the under-screen image.

It should be noted that, unlike the training method in the term introduction section, the training data set of the model in the training method in the embodiment of the present application is an image processed by the AI preprocessing model, and the training method in the embodiment of the present application is similar to the term introduction section in terms of the specific training principle, and is not described herein.

It can be understood that in the embodiment of the application, because the AI pretreatment model is trained first, only a small amount of training data is needed by the AI pretreatment model in the self-training stage, and the problem that a long time is needed for data construction caused by large-scale image acquisition can be solved. And the following face unlocking related models are respectively trained at equal intervals, so that the training difficulty is low.

In the second embodiment of the joint training of the AI pretreatment model shown in fig. 5, when the AI pretreatment model with better image output quality and with poorer quality is trained, the model which needs to be used together with the AI pretreatment model is joint, and the output of the model which needs to be used together with the AI pretreatment model is used as a feedback factor for training the AI pretreatment model, so that the output of the trained AI pretreatment model can better meet the requirement of the model which uses the output later, and the subsequent model can obtain better treatment effect. For example, when the AI preprocessing model is combined with the face unlocking related model to realize functions such as face unlocking, the face unlocking related model can obtain better output precision. As shown in fig. 5, the method includes:

s501: a third training data set is acquired.

In this embodiment of the present application, the third training data set may be a data set for training the AI pre-processing model, and contents in the third training data set may refer to a related expression of the first training data set, which is not described again.

S502: and acquiring a preset model to be trained jointly with the AI pretreatment model.

In this embodiment of the present application, the preset model may be a model that is to be used in combination with the AI pretreatment model and is completed with training, and the number of preset models may be one or more.

For convenience of description, in this embodiment of the present application, the preset model includes: the human face recognition model, the human eye gazing recognition model, the open eye recognition model and the human face anti-counterfeiting model are exemplified. It should be appreciated that the predetermined model may also be any one or more of the models described above.

It should be noted that, the face unlocking related model mentioned in the embodiment of the present application may be a conventional model obtained based on the training of the non-screen image. Or it is understood that the embodiment of the application can reuse the conventional existing model, and the parameters of the preset model do not need to be adjusted when the model is trained or used together with the AI pretreatment model later. In this way, the number of models that need to be trained can be reduced.

S503: and training according to the third training data set, the model to be trained, the preset model and the target loss function to obtain an AI preprocessing model.

In this embodiment of the present application, the third training data set may be used as an input of a model to be trained, an output of the model to be trained may be used as an input of a preset model, an output of the preset model may act on a target loss function, and parameters of the model to be trained may be adjusted according to the target loss function until the target loss function converges, or the target loss function is smaller than a certain value, or the training reaches a maximum training frequency, to obtain an AI preprocessing model. In the training process, model parameters of a preset model do not need to be adjusted during training.

Exemplary, as shown in fig. 6, the preset model includes: human face recognition model, human eye gazing recognition model, open-close eye recognition model and human face anti-fake model are taken as examples, and AI preprocessing model C _θ Is defined as L _c Face recognition model F _θ Is fixed by the loss function of (2)Meaning L _F Human eye gazing recognition model G _θ Is defined as L _G Open-close eye recognition model E _θ Is defined as L _E Human face anti-fake model R _θ Is defined as L _R 。

Target loss function L _total Can be combined with L _c 、L _F 、L _G 、L _E L and _R related to the following.

Illustratively, taking a batch size (batch-size) of 1 when the third dataset is trained, the IR image as the test data is x, and the reference image corresponding to x is y as an example.

AI pretreatment model C _θ The output predicted value of (2) is: y' =c _θ (x)。

Lc may be a loss value obtained between y 'and y, which can reflect the difference in pixel points between y' and y. Lc may be any type of loss function: an L1 loss function, an L2 loss function, a smoothL1 loss function, and the like. Exemplary, L _c Can be abs (y '-y), or (y' -y) ² Or the following piecewise function:

0.5(y′-y)，|y′-y|<1

|y′-y|-0.5,(y′-y)<-1 or(y′-y)>1

for face recognition model F _θ F of y' is _θ And inputting a reference image with y being y'. y' and y pass through F _θ A one-dimensional vector can be obtained.

L _F Can be used for reflecting F _θ (y′),F _θ (y) similarity between the two. In the similarity calculation, any of the following algorithms may be used: cosine similarity, euclidean distance, manhattan distance, etc. Taking cosine similarity calculation as an example, L _F The following formula may be satisfied:

L _F ＝1-cos_sim(F _θ (y′),F _θ (y))

because y' =c _θ (x) L is then _F The formula may be modified as:

L _F ＝1-cos_sim(F _θ (C _θ (x)),F _θ (y))

human eye gazing recognition model G _θ Open eye and closed eye recognition model E _θ Face anti-fake model R _θ Both can be a classification model, for example CNN, transformer, multi-layer supercedron (MLP), etc.

Human eye gazing recognition model G _θ Open eye and closed eye recognition model E _θ Face anti-fake model R _θ The inputs of (a) can be y', the outputs can be the results after classification, and the loss functions can be calculated by adopting the cross entropy (binary cross entropy, BCE) of the classification. Exemplary, L _G 、L _E L and _R the following formulas may be satisfied, respectively:

L _G ＝BCE(G _θ (y′),G _θ (y))＝BCE(G _θ (C _θ (x)),G _θ (y))

L _E ＝BCE(E _θ (y′),E _θ (y))＝BCE(E _θ (C _θ (x)),E _θ (y))

L _R ＝BCE(R _θ (y′),R _θ (y))＝BCE(R _θ (C _θ (x)),R _θ (y))

target loss function L _total And L is equal to _c 、L _F 、L _G 、L _E L and _R related to the following. Exemplary, L _total The following formula may be satisfied:

L _total ＝αL _c +βL _F +γL _G +θL _E +τL _R

wherein α, β, γ, θ, τ can be predetermined constants.

Exemplary, in one possible implementation, L may be defined by setting α, β, γ, θ, τ to be _c 、L _F 、L _G 、L _E L and _R each pair L _total Is almost the same as the influence level of (a). For example, alpha L can be made _c 、βL _F 、γL _G 、θL _E τL _R All are in a certain data interval, or the value between any two is less than 10 times, etc. so thatThe weight occupied by each face unlocking related model when the AI preprocessing model is trained can be similar, and the AI preprocessing model can be used in combination with each face unlocking related model.

In another possible implementation, α+β+γ+θ+τ=1 may also be set, or may substantially satisfy at some single iteration of model training: alpha L _c About equal to thetaL _F About equal to gamma L _G About equal to thetaL _E About equal to tau L _R Etc., the embodiments of the present application are not particularly limited.

It should be noted that, in the embodiment of the present application, the face recognition model F _θ Human eye gazing recognition model G _θ Open eye and closed eye recognition model E _θ Face anti-fake model R _θ One or more types of the model can be omitted according to a specific scene, and when the AI preprocessing model is trained, the data related to the omitted model can be removed, and the training mode is similar and is not repeated here. The types of neural network models that may be adopted by the AI preprocessing model in the embodiment of the present application and specific enhancing effects that can be achieved on the image may refer to the description of the corresponding embodiment in fig. 4, which is not repeated herein.

According to the embodiment of the application, when the AI pretreatment model is jointly trained, the existing model can be reused, so that the variety of the training model can be reduced.

When the related model is obtained through training in the mode of fig. 4 or fig. 5, the trained model can be deployed in terminal equipment needing to use the related model, and the terminal equipment can use the related model to realize corresponding functions.

The method of the embodiment of the application is described by taking the example that the terminal equipment realizes face unlocking. When the face unlocking is realized, one or more of an AI preprocessing model, a face recognition model, a open eye recognition model, a human eye gazing recognition model and a face anti-counterfeiting model can be used.

The face recognition model can be used for recognizing whether the image processed by the AI preprocessing model is consistent with a pre-stored face for unlocking the terminal equipment, if so, the face in the image under the screen can be determined to have unlocking authority, and if not, the intention of others not having the authority to invade the terminal equipment can be determined, and unlocking can be terminated.

The open-eye and close-eye recognition model can be used for recognizing whether the eyes of the image processed by the AI preprocessing model are open, the open-eye can confirm that the user has unlocking intention, and the close-eye can be that other people try to unlock the terminal by using the face of the user in sleeping or other close-eye scenes, and the unlocking can be terminated.

The human eye gazing recognition model can be used for recognizing whether human eyes of the image processed by the AI preprocessing model gazes at a screen of the terminal equipment, if gazes, a user can confirm that the user has unlocking intention, and if not gazes, the user can perform other activities before the terminal equipment without unlocking intention, and the unlocking can be stopped.

The human face anti-counterfeiting model can be used for identifying whether the image processed by the AI preprocessing model is based on an image obtained by a real person, the real person can confirm that the user has unlocking intention, and the non-real person can possibly use a photo or a model of the user to try to invade the terminal equipment, so that unlocking can be stopped.

It can be appreciated that when face unlocking is achieved, the greater the number of face unlocking related models used, the higher the security and quality of user experience during face unlocking may be. The fewer the number of the face unlocking related models is, the lower the computation amount during face unlocking is, which is beneficial to reducing the power consumption of the terminal equipment. In specific use, the terminal device may use one or more models related to face unlocking based on user-defined settings or default settings, etc., which are not specifically limited in the embodiments of the present application.

By way of example, fig. 7 shows a face unlocking flow diagram. As shown in fig. 7, the method may include:

s701: the terminal device acquires an under-screen image.

In an exemplary screen locking state, the terminal device may obtain a raw image by using a TOF camera under the screen, and further analyze the raw image to obtain an image under the IR screen.

S702: the terminal equipment inputs the under-screen image into an AI preprocessing model to obtain a processed image.

The AI preprocessing model in the embodiment of the present application may be an AI preprocessing model in the embodiment corresponding to fig. 4, and the face unlocking-related model in the subsequent S703 may be a face unlocking-related model obtained by training in the embodiment corresponding to fig. 4.

The AI preprocessing model in the embodiment of the present application may also be an AI preprocessing model in the embodiment corresponding to fig. 5, and the face unlocking-related model in the subsequent S703 may be any model obtained based on the training of the no-screen image.

The terminal equipment inputs the under-screen image into an AI preprocessing model, so that a processed image with better quality can be obtained.

S703: the terminal equipment takes the processed image as the input of the face unlocking related model, and executes the unlocking flow based on the face unlocking related model.

The terminal device may use the processed image as input of a face recognition model, a open-eye recognition model, a human eye gazing recognition model and a face anti-counterfeiting model, respectively.

If any of the following conditions occurs, the terminal device can exit the unlocking process, and the unlocking fails: the human face of the processed image is not matched with the preset human face through the human face recognition model, the human eyes of the processed image are closed through the open eye recognition model, the human eyes of the processed image are non-gazed through the human eye gazing recognition model, and the human figure of the processed image is non-real through the human face anti-fake model.

If the face recognition model recognizes that the face of the processed image is matched with the preset face, the open eye recognition model recognizes that the eyes of the processed image are open eyes, the eye gaze recognition model recognizes that the eyes of the processed image are gazed, and the face anti-counterfeiting model recognizes that the person of the processed image is a true person, the terminal equipment can realize that the user based on the face does not have perception unlocking.

It can be understood that the face recognition model, the open-close eye recognition model, the human eye gaze recognition model and the face anti-counterfeiting model can be recognized at the same time, or can be sequentially ordered in any form, and can be sequentially recognized according to the ordering, and the embodiment of the application is not particularly limited.

In the embodiment of the application, when the terminal equipment realizes the face unlocking, the acquired image is processed by the AI pretreatment model which is jointly trained with the face unlocking related model to obtain the image which can be accurately identified by the face unlocking related model, so that the accuracy and the success rate of the face unlocking can be improved.

Optionally, the terminal device may select which face unlocking related models are specifically used to perform face unlocking based on the user setting.

By way of example, fig. 8 shows one possible interface diagram for a user to set a face unlocking related model. As shown in fig. 8, a plurality of face unlocking modes may be displayed in the interface.

For example, the standard mode may include face ID recognition and face authenticity, and when the user selects the standard mode, the terminal device may use the AI preprocessing model to combine with the face recognition model and the face anti-counterfeiting model to execute the face unlocking process.

The mask mode can comprise open-close eye recognition, eye gazing recognition and face authenticity, and when the user selects the standard mode, the terminal equipment can use the AI preprocessing model to combine the open-close eye recognition model, the eye gazing recognition model and the face anti-counterfeiting model to execute the face unlocking process. In the implementation, the user can conveniently realize the non-perception face unlocking when wearing the mask.

The strict mode can comprise face ID recognition, open eye recognition, eye fixation recognition and face authenticity, and when the user selects the strict mode, the terminal equipment can use an AI preprocessing model to combine with a face recognition model, an open eye recognition model, an eye fixation recognition model and a face anti-counterfeiting model to execute a face unlocking flow. In the implementation, the privacy security and the user experience of face unlocking can be better improved.

In the user-defined mode, the user can select one or more of face ID recognition, open-close eye recognition, eye gaze recognition or face authenticity, respectively, and then the terminal device can execute the face unlocking process based on the corresponding model selected by the user.

It will be appreciated that the user interface of fig. 8 is merely illustrative, and the various face unlocking modes are also merely illustrative, and in specific applications, the names of the modes and the corresponding functions may be modified according to requirements, and the modes may be adapted to be deleted or added, which is not specifically limited in the embodiments of the present application.

It should be noted that, if the terminal device performs face unlocking by using the AI preprocessing model and the face unlocking correlation model trained in the embodiment of fig. 4, because the face unlocking correlation models in the embodiment of fig. 4 are independent, the terminal device may store the face unlocking correlation models obtained by training the method of the embodiment of fig. 4 independently, and call the models after determining the models to be used.

If the terminal device performs face unlocking by using the AI pre-processing model and the face unlocking correlation model trained in the embodiment of fig. 5, because the AI pre-processing models are combined with the face unlocking correlation models in the training of the AI pre-processing model in the embodiment of fig. 5, any one unlocking mode in fig. 5 corresponds to one AI pre-processing model and the corresponding face unlocking correlation model, and after determining the unlocking mode to be used, the terminal device can call the one AI pre-processing model and the corresponding face unlocking correlation model corresponding to the unlocking mode to perform a corresponding unlocking flow.

It can be understood that, in the embodiment of the present application, the face unlocking is performed by the terminal device as an example, and the method of the embodiment of the present application may also be applied to scenes such as face payment, which is not described herein.

The foregoing description of the solution provided in the embodiments of the present application has been mainly presented in terms of a method. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative method steps described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

According to the embodiment of the application, the device for realizing the image processing method based on the under-screen image can be divided into the functional modules according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

Fig. 9 is a schematic structural diagram of a chip according to an embodiment of the present application. Chip 90 includes one or more (including two) processors 901, communication lines 902, communication interfaces 903, and memory 904.

In some implementations, the memory 904 stores the following elements: executable modules or data structures, or a subset thereof, or an extended set thereof.

The methods described in the embodiments of the present application may be applied to the processor 901 or implemented by the processor 901. Processor 901 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 901 or instructions in the form of software. The processor 1201 may be a general purpose processor (e.g., a microprocessor or a conventional processor), a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), an off-the-shelf programmable gate array (field-programmable gate array, FPGA) or other programmable logic device, discrete gates, transistor logic, or discrete hardware components, and the processor 901 may implement or perform the methods, steps, and logic diagrams associated with the processes disclosed in the embodiments of the present application.

The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a state-of-the-art storage medium such as random access memory, read-only memory, programmable read-only memory, or charged erasable programmable memory (electrically erasable programmable read only memory, EEPROM). The storage medium is located in the memory 904, and the processor 901 reads information in the memory 904 and performs the steps of the above method in combination with its hardware.

The processor 901, the memory 904, and the communication interface 903 may communicate via a communication line 902.

In the above embodiments, the instructions stored by the memory for execution by the processor may be implemented in the form of a computer program product. The computer program product may be written in the memory in advance, or may be downloaded in the form of software and installed in the memory.

Embodiments of the present application also provide a computer program product comprising one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.), or semiconductor medium (e.g., solid state disk, SSD)) or the like.

Embodiments of the present application also provide a computer-readable storage medium. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Computer readable media can include computer storage media and communication media and can include any medium that can transfer a computer program from one place to another. The storage media may be any target media that is accessible by a computer.

As one possible design, the computer-readable medium may include compact disk read-only memory (CD-ROM), RAM, ROM, EEPROM, or other optical disk memory; the computer readable medium may include disk storage or other disk storage devices. Moreover, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital versatile disc (digital versatile disc, DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing detailed description of the invention has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of illustration and description only, and is not intended to limit the scope of the invention.

Claims

1. An image processing method based on an under-screen image, the method comprising:

acquiring an under-screen image;

inputting the under-screen image into a pre-trained artificial intelligence AI preprocessing model to obtain a processed image;

inputting the processed image into an image processing model to obtain a processing result;

when the image processing model is a model obtained by training based on an image output by the AI preprocessing model, the AI preprocessing model is a model obtained by training by using a first training data set, wherein the first training data set comprises first test data and first sample data, a sample image is corresponding to a test image in the first test data, and the image quality of the test image is inferior to that of the sample image; the image processing model is obtained by inputting a second training data set into the AI pretreatment model and then training by utilizing the output of the AI pretreatment model, and the second training data set comprises a related data set for realizing functions required by the image processing model.

2. The method of claim 1, wherein the under-screen image is an image captured based on a camera disposed under the screen.

3. The method of claim 2, wherein the camera comprises a time of flight TOF camera.

4. A method according to any of claims 1-3, wherein the image processing model comprises one or more of the following: face recognition model, open eye closing model, eye gazing model or face anti-fake model.

5. The method according to any one of claims 1-4, further comprising:

when the AI pretreatment model is a model obtained by training in combination with the image processing model, the parameters of the image processing model are not adjustable when the AI pretreatment model is trained, the parameters of the AI pretreatment model are adjustable, and the AI pretreatment model completes training when the value calculated by using the target loss function converges; wherein the target loss function is related to a loss function of the AI pretreatment model and a loss function of the image processing model.

6. The method according to claim 5, wherein the number of the image processing models is plural, and a weight difference between a loss function of the AI pre-processing model and any two of the loss functions of any one of the image processing models is smaller than a preset value when the target loss function is calculated.

7. The method of claim 6, wherein when the image processing model includes a face recognition model, an open eye model, an eye gaze model, and a face anti-counterfeiting model, the objective loss function satisfies the following formula:

L _total ＝αL _c +βL _F +γL _G +θL _E +τL _R

wherein the loss function of the AI pretreatment model is L _c The loss function of the face recognition model is L _F The loss function of the human eye gazing recognition model is L _G The open and close eye recognition mouldThe loss function is L _E The loss function of the human face anti-counterfeiting model is L _R Alpha, beta, gamma, theta and tau are all preset constants.

8. The method according to any one of claims 1-7, wherein the number and kind of the image processing models are configurable; alternatively, the test image is an image obtained by performing a degradation process on the sample data, the degradation process including one or more of: increase newton rings, increase diffraction spots, decrease grey scale values, or increase picture blurring.

9. The method of claim 8, wherein the method further comprises:

displaying a first interface, wherein the first interface comprises identifications of a plurality of face unlocking modes, and each identification corresponds to a control;

When the trigger of the target control corresponding to the identification of the target face unlocking mode in the plurality of face unlocking modes is received, setting the image processing model as a model corresponding to the target face unlocking mode.

10. The method of claim 9, wherein the plurality of face unlocking modes comprises a plurality of: standard mode, mask mode, strict mode or custom mode;

the image processing model corresponding to the standard mode comprises a face recognition model and a face anti-counterfeiting model;

the image processing model corresponding to the mask mode comprises a closed-eye opening recognition model, a human eye gazing recognition model and a human face anti-counterfeiting model;

the image processing model corresponding to the strict mode comprises a face recognition model, a closed eye recognition model, a human eye gazing recognition model and a face anti-counterfeiting model;

the image processing model corresponding to the custom mode comprises one or more of a face recognition model, a open-close eye recognition model, a human eye gazing recognition model or a face anti-counterfeiting model.

11. An electronic device, comprising: a memory for storing a computer program, and a processor for executing the computer program to perform the method of image processing based on an off-screen image as claimed in any one of claims 1 to 10.

12. A computer-readable storage medium storing instructions that, when executed, cause a computer to perform the method of image processing based on an off-screen image according to any one of claims 1-10.