WO2023124237A9

WO2023124237A9 - Image processing method and apparatus based on under-screen image, and storage medium

Info

Publication number: WO2023124237A9
Application number: PCT/CN2022/118604
Authority: WO
Inventors: 周俊伟; 宋小刚; 刘小伟; 陈兵; 王国毅
Original assignee: 荣耀终端有限公司
Priority date: 2021-12-29
Filing date: 2022-09-14
Publication date: 2024-04-04
Also published as: CN116416656A; WO2023124237A1

Abstract

Embodiments of the present application relate to the technical field of artificial intelligence (AI), and provide an image processing method and apparatus based on an under-screen image, and a storage medium. When an image processing model is a model trained on the basis of an image outputted from an AI preprocessing model, the AI preprocessing model used for outputting an image having better quality by using an image having poor quality can be trained first, and a data set used for training the image processing model used in conjunction with the AI preprocessing model is constructed by utilizing the output of the AI preprocessing model, and after the data set is trained to obtain the image processing model, the output of the AI preprocessing model can well meet the input requirement of the image processing model. Therefore, the under-screen image is inputted into the AI preprocessing model to obtain a processed image, the processed image is then inputted into the image processing model, and a relatively good processing effect can be obtained. In this way, a terminal device can achieve better image processing when a screen is not punched, and the flexibility of appearance design of the terminal device is improved.

Description

Image processing method, device and storage medium based on off-screen image

This application requests the priority of the Chinese patent application submitted to the State Intellectual Property Office of China on December 29, 2021, with the application number 202111645175.4 and the application name "Image processing method, device and storage medium based on under-screen images", all of which The contents are incorporated into this application by reference.

Technical field

This application relates to the technical field of artificial intelligence (AI), and in particular to an image processing method, device and storage medium based on off-screen images.

Background technique

With the development of terminal technology, the functions of terminal equipment are becoming more and more diverse. For example, the terminal device can have functions such as face unlocking, face payment, gesture unlocking, and gesture payment. When a terminal device implements the above functions, the terminal device usually needs to capture images and implement the above functions through an AI model used to process the images.

When the AI model implements the above functions, it is usually necessary to input better quality images into the AI model. Better quality images include, for example, images with better clarity and brightness. This is because when an AI model processes an image, it usually needs to implement face recognition or gesture recognition based on the feature information of the image. The feature information in low-quality images is missing or not obvious, which will affect the recognition accuracy of the AI model and lead to terminal failure. When the equipment implements the above functions, the accuracy is low and the effect is poor.

Therefore, in a common implementation, it is necessary to open a hole in the screen of the terminal device so that the camera sensor is not blocked by the screen when receiving optical signals, thereby obtaining a relatively good quality image. However, the method of opening holes in the screen of the terminal device limits the flexibility of the terminal device's appearance design, and may also affect the visual perception of some users who prefer a complete screen.

Contents of the invention

Embodiments of the present application provide an image processing method, device and storage medium based on under-screen images, which can obtain better image recognition effects based on images captured by a camera arranged below the screen, and therefore can be applied to screens without openings. Terminal equipment, so that the screen of the terminal equipment is not limited in design by implementing the image recognition function.

In a first aspect, embodiments of the present application provide an image processing method based on an off-screen image. The method includes: acquiring an off-screen image. Input the off-screen image into the pre-trained artificial intelligence AI preprocessing model to obtain the processed image. Input the processed image into the image processing model to obtain the processing result. Wherein, when the image processing model is a model trained based on the image output by the AI preprocessing model, the AI preprocessing model is a model trained using the first training data set, and the first training data set includes the first test data and the first Sample data, the test image in the first test data corresponds to a sample image in the first sample data, and the image quality of the test image is worse than the image quality of the sample image. The image processing model is obtained by inputting the second training data set into the AI preprocessing model and then training it using the output of the AI preprocessing model. The second training data set includes relevant data sets used to implement the functions required by the image processing model.

In the embodiment of the present application, when the image processing model is a model trained based on the images output by the AI preprocessing model, the AI preprocessing model for using poor quality images to output better quality can be first trained, and then the AI The output of the preprocessing model constructs a data set used to train the image processing model used in conjunction with the AI preprocessing model. After training the image processing model on the data set, the output of the AI preprocessing model can better meet the input of the image processing model. Therefore, input the off-screen image into the AI preprocessing model, obtain the processed image, and then input the processed image into the pre-trained image processing model in the embodiment of this application, so that better processing results can be obtained.

In one possible implementation, the under-screen image is an image captured by a camera arranged under the screen. In this way, the terminal device can achieve better image processing without punching a hole in the screen, which increases the flexibility of the terminal device's appearance design.

In a possible implementation, the test image is an image captured by a camera set under the screen, or the test image is an image obtained by degradation processing of sample data. In this way, when the test image is an image taken by a camera installed under the screen, the similarity between the test image and the actual image taken by the terminal device is high, which is conducive to training a model with better recognition effect. In the implementation where the test image is an image obtained by degradation processing of sample data, there is no need to use specific equipment to obtain the test image, and it is easier to obtain a large number of test images using sample data of better quality.

In a possible implementation, the camera includes a time-of-flight TOF camera.

In a possible implementation, the degradation processing includes one or more of the following: adding Newton rings, increasing diffraction spots, reducing grayscale values, or increasing image blur effects. In this way, through degradation processing, it is possible to obtain an off-screen image similar to that captured by a real under-screen camera. For example, a model with better recognition effect can be obtained through training.

In a possible implementation, the image processing model includes one or more of the following: a face recognition model, an eye opening and closing model, a human eye gaze model, or a face anti-counterfeiting model. In this way, when the AI preprocessing model is combined with the image processing model to implement functions such as face unlocking, the image processing model can obtain better output accuracy. It can be understood that when the image processing model is applied to the face unlocking model, it can also be called a face unlocking related model.

In a possible implementation, when the AI preprocessing model is a model trained by a joint image processing model, when training the AI preprocessing model, the parameters of the image processing model are not adjustable, and the parameters of the AI preprocessing model are adjustable. The preconditioned model completes training when the values calculated using the target loss function converge. Among them, the target loss function is related to the loss function of the AI preprocessing model and the loss function of the image processing model. In this way, when training an AI preprocessing model that uses poor quality images to output better quality, it can be combined with the subsequent image processing model that needs to be used together with the AI preprocessing model, and the output of the image processing model can be used as the training AI preprocessing model. The feedback factor of the model, then the output of the trained AI preprocessing model can better meet the needs of the image processing model. Therefore, input the off-screen image into the AI preprocessing model pretrained in the embodiment of this application, and obtain the processed image. Then input the processed image into the image processing model to obtain better processing results.

In one possible implementation, there are multiple image processing models. When calculating the target loss function, the weight difference between any two loss functions of the AI preprocessing model and any image processing model is less than default value. In this way, the weights occupied by each image processing model in training the AI preprocessing model are similar, and the AI preprocessing model can be better used in conjunction with each image processing model.

In a possible implementation, when the image processing model includes a face recognition model, an eye opening and closing model, a human eye gaze model, and a face anti-counterfeiting model, the target loss function satisfies the following formula:

L _total =αL _c +βL _F +γL _G +θL _E +τL _R

Among them, the loss function of the AI preprocessing model is L _c , the loss function of the face recognition model is _{LF , the loss function of the human eye gaze recognition model is LG} , the loss function of the open and closed eyes recognition model is _LE , _and the loss function of the face recognition model is LE The loss function of the anti-counterfeiting model is L _R , and α, β, γ, θ, and τ are all preset constants. In this way, when the AI preprocessing model is combined with the image processing model to implement functions such as face unlocking, the image processing model can obtain better output accuracy. Moreover, the embodiments of the present application can reuse conventional existing image processing models, and there is no need to adjust the parameters of the image processing model when jointly training or using it with the AI preprocessing model. In this way, the number of models that need to be trained can be reduced.

In a possible implementation, the number and types of image processing models are configurable. In this way, the terminal device can flexibly configure the number and type of image processing models based on environment recognition or user settings to meet diverse needs.

One possible implementation further includes: displaying a first interface, where the first interface includes identifiers of multiple face unlock modes, and each identifier has a corresponding control. When receiving a trigger for a target control corresponding to the identification of the target face unlocking mode among the multiple face unlocking modes, the image processing model is set to a model corresponding to the target face unlocking mode. In this way, users can flexibly choose the face unlock mode they want to better meet user needs.

In a possible implementation, the multiple face unlocking modes include the following: standard mode, mask mode, strict mode or custom mode. The image processing models corresponding to the standard mode include face recognition models and face anti-counterfeiting models. The image processing models corresponding to the mask mode include open and closed eye recognition models, human eye gaze recognition models and face anti-counterfeiting models. The image processing models corresponding to the strict mode include face recognition model, eye opening and closing recognition model, human eye gaze recognition model and face anti-counterfeiting model. The image processing model corresponding to the custom mode includes one or more of a face recognition model, an eye opening and closing recognition model, a human eye gaze recognition model, or a face anti-counterfeiting model. In this way, users can choose a suitable mode for face unlocking based on their environment.

In the second aspect, embodiments of the present application provide an image processing device. The image processing device may be a terminal device, or may be a chip or chip system in the terminal device. The image processing device may include a display unit and a processing unit. When the image processing device is a terminal device, the display unit may be a display screen. The display unit is used to perform the step of display, so that the terminal device implements the display-related method described in the first aspect or any possible implementation of the first aspect, and the processing unit is used to implement the first aspect or the first aspect. Any method related to processing in any possible implementation of the aspect. When the image processing apparatus is a terminal device, the processing unit may be a processor. The image processing device may further include a storage unit, and the storage unit may be a memory. The storage unit is used to store instructions, and the processing unit executes the instructions stored in the storage unit, so that the terminal device implements a method described in the first aspect or any possible implementation of the first aspect. When the image processing device is a chip or a chip system in a terminal device, the processing unit may be a processor. The processing unit executes instructions stored in the storage unit, so that the terminal device implements a method described in the first aspect or any possible implementation of the first aspect. The storage unit may be a storage unit within the chip (eg, register, cache, etc.), or may be a storage unit in the terminal device located outside the chip (eg, read-only memory, random access memory, etc.).

Exemplarily, the processing unit is used to obtain an off-screen image, input the off-screen image into a pre-trained artificial intelligence AI preprocessing model to obtain a processed image, and input the processed image into an image processing model to obtain a processing result. Wherein, when the image processing model is a model trained based on the image output by the AI preprocessing model, the AI preprocessing model is a model trained using the first training data set, and the first training data set includes the first test data and the first Sample data, the test image in the first test data corresponds to a sample image in the first sample data, and the image quality of the test image is worse than the image quality of the sample image. The image processing model is obtained by inputting the second training data set into the AI preprocessing model and then training it using the output of the AI preprocessing model. The second training data set includes relevant data sets used to implement the functions required by the image processing model.

In one possible implementation, the under-screen image is an image captured by a camera arranged under the screen. The test image is an image captured by a camera set under the screen, or the test image is an image obtained by degradation processing of sample data.

In one possible implementation, the camera includes a time-of-flight TOF camera, and the degradation processing includes one or more of the following: increasing Newton rings, increasing diffraction spots, reducing grayscale values, or increasing image blur effects.

In one possible implementation, when the AI preprocessing model is a model trained by a joint image processing model, when training the AI preprocessing model, the parameters of the image processing model are not adjustable, and the parameters of the AI preprocessing model are adjustable.

The AI preprocessing model completes training when the values calculated using the target loss function converge. Among them, the target loss function is related to the loss function of the AI preprocessing model and the loss function of the image processing model.

In one possible implementation, the number of image processing models is multiple. When calculating the target loss function,

The weight difference between any two loss functions of the AI preprocessing model and any image processing model is less than the preset value.

L _total =αL _c +βL _F +γL _G +θL _E +τL _R

Among them, the loss function of the AI preprocessing model is L _c , the loss function of the face recognition model is LF , the loss function of the human eye gaze recognition model is _LG , the loss function of the open and closed eyes recognition model is _LE , _and the loss function of the face recognition model is LE The loss function of the anti-counterfeiting model is L _R , and α, β, γ, θ, and τ are all preset constants.

In a possible implementation, the number and types of image processing models are configurable.

In a possible implementation, the display unit is used to display a first interface. The first interface includes identifiers of multiple face unlock modes, and each identifier has a corresponding control. When the display unit receives the trigger of the target control corresponding to the identification of the target face unlocking mode in the multiple face unlocking modes, the processing unit is used to set the image processing model to the model corresponding to the target face unlocking mode.

In a possible implementation, the multiple face unlocking modes include the following: standard mode, mask mode, strict mode or custom mode. The image processing models corresponding to the standard mode include face recognition models and face anti-counterfeiting models. The image processing models corresponding to the mask mode include open and closed eye recognition models, human eye gaze recognition models and face anti-counterfeiting models. The image processing models corresponding to the strict mode include face recognition model, eye opening and closing recognition model, human eye gaze recognition model and face anti-counterfeiting model. The image processing model corresponding to the custom mode includes one or more of a face recognition model, an eye opening and closing recognition model, a human eye gaze recognition model, or a face anti-counterfeiting model.

In a third aspect, embodiments of the present application provide an electronic device, including a processor and a memory. The memory is used to store code instructions, and the processor is used to run the code instructions to execute the first aspect or any possible implementation of the first aspect. The method described in the method.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium. Computer programs or instructions are stored in the computer-readable storage medium. When the computer programs or instructions are run on a computer, they cause the computer to execute the first aspect or the first aspect. The image processing method based on the off-screen image described in any possible implementation manner of the aspect.

In a fifth aspect, embodiments of the present application provide a computer program product including a computer program. When the computer program is run on a computer, it causes the computer to execute the method based on the first aspect or any possible implementation of the first aspect. Image processing methods for off-screen images.

In a sixth aspect, the present application provides a chip or chip system, which chip or chip system includes at least one processor and a communication interface. The communication interface and the at least one processor are interconnected through lines, and the at least one processor is used to run computer programs or instructions. To perform the image processing method based on the off-screen image described in the first aspect or any possible implementation of the first aspect. Among them, the communication interface in the chip can be an input/output interface, a pin or a circuit, etc.

In a possible implementation, the chip or chip system described above in this application further includes at least one memory, and instructions are stored in the at least one memory. The memory can be a storage unit inside the chip, such as a register, a cache, etc., or it can be a storage unit of the chip (such as a read-only memory, a random access memory, etc.).

It should be understood that the second to sixth aspects of the present application correspond to the technical solution of the first aspect of the present application, and the beneficial effects achieved by each aspect and corresponding feasible implementations are similar, and will not be described again.

Description of the drawings

Figure 1 is a schematic diagram of applicable scenarios for the embodiment of this application;

Figure 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application;

Figure 3 is a schematic diagram of the software architecture of an electronic device provided by an embodiment of the present application;

Figure 4 is a schematic flow chart of model training provided by an embodiment of the present application;

Figure 5 is a schematic flow chart of model training provided by an embodiment of the present application;

Figure 6 is a schematic flow chart of model training provided by an embodiment of the present application;

Figure 7 is a schematic flowchart of an image processing method based on off-screen images provided by an embodiment of the present application;

Figure 8 is a schematic diagram of a terminal device interface provided by an embodiment of the present application;

Figure 9 is a schematic structural diagram of a chip provided by an embodiment of the present application.

Detailed ways

In order to facilitate a clear description of the technical solutions of the embodiments of the present application, some terms and technologies involved in the embodiments of the present application are briefly introduced below:

1) Under-screen image: It can be understood as an image captured by a camera covered under the screen. Cameras may include cameras for capturing color images or time of flight (TOF) cameras, etc. Among them, the terminal device can use the TOF camera to capture the original image raw image, and analyze the raw image to obtain an infrared (IR) image, as well as a three-dimensional image with depth, etc.

2) AI model: It can be understood as a model trained based on AI technology to achieve certain functions. For example, the relevant AI models in the embodiments of this application may include one or more of the following: AI preprocessing model, face detection model, face recognition model, open and closed eye recognition model, human eye gaze recognition model, and living body detection. models, three-dimensional anti-counterfeiting models, gesture recognition models, expression recognition models, etc.

Among them, the AI preprocessing model is used to process off-screen images into images with better quality. Face detection models are used to detect the location of faces in images. The face recognition model is used to identify the identity (ID) corresponding to the face. The identity identifier can be, for example, the identity information, title, authority, etc. corresponding to the face. The open and closed eye recognition model is used to identify whether the human eyes in the image are open or closed. The human eye gaze recognition model is used to identify whether human eyes are looking at the terminal device. The living body detection model is used to detect whether there is a living body in front of the camera. The three-dimensional anti-counterfeiting model is used to identify whether there are three-dimensional attacks. The gesture recognition model is used to identify gesture categories, which include, for example, like gestures, OK gestures, or fist gestures. The expression recognition model is used to identify expression categories, such as happiness, surprise, sadness, anger, disgust or fear.

It can be understood that the above-mentioned models can be independent or combined with each other. For example, the face detection model can be combined with the face recognition model to implement face detection and recognition, etc.

The above-mentioned models may also have other naming methods, such as naming them as the first model, the second model, the Nth model, the target model or the neural network model, etc. The embodiments of this application only use the above naming as an example of the model name. The specific details of the model are The meaning can refer to the specific role that the model can play.

3) Face unlocking related models: one or more models that can be used for face unlocking. For example, face unlocking related models may include one or more of the following: face recognition model, eye opening and closing recognition model, human eye gaze recognition model, and face anti-counterfeiting model. The following will introduce in detail the possible training and use methods of several face unlocking related models.

Face recognition model: During training, a face data set including different identities can be constructed. Based on the face data set and a loss function with high discrimination of face feature vectors, the neural network model is trained to obtain a face recognition model.

Among them, the loss function with high discrimination of face feature vectors may include a cross-entropy loss function or a variant of the cross-entropy loss function. For example, the cross-entropy loss function L _c can satisfy the following formula:

N is the number of pictures in the face data set (or the current batch); y _ic is a sign function, and the value can be 0 or 1. For example, if the true category of sample i is equal to c, take 1, otherwise take 0; p _ic is the predicted probability that image i belongs to category c, which is the predicted value of the model in the training stage.

After training is completed, when using the face recognition model, different processes can be executed based on different scenarios. For example, when the face recognition model is used for identity recognition, the face recognition model is set in the electronic device. The electronic device can first input a person's face template image. The face template image includes, for example, "up, down, left, right, front." For the face images corresponding to the five angles, the electronic device then uses the trained face recognition model to extract face feature vectors, and finally averages the five feature vectors and saves them in the template library. Then, for the unknown face image to be tested, the face recognition model is used to extract the face feature vector of the unknown face image and calculate the similarity with the face feature vector in the template library, and compare the face feature vector in the template library with the unknown The identity corresponding to the feature vector whose similarity of the face feature vector of the identity's face image is greater than the preset threshold is assigned to the test image of this generation, and the identity recognition result is obtained. For example, the preset threshold can be any value between 0.5 and 1.

Open and closed eye recognition model: During training, you can construct a face data set marked with open or closed eyes, and input the face data set into the model to be trained. The model to be trained can first detect network recognition based on facial key points. Face key points, and then crop out the human eye sub-image based on the coordinates of these key points, and further output the predicted eye open or closed confidence based on the human eye sub-image. When the predicted value of the model to be trained and the corresponding annotation benchmark value are based on the loss When the numerical values calculated by the function converge, the trained open and closed eye recognition model can be obtained.

Among them, the model structure to be trained can be a classification model based on a convolutional neural network, and the pictures corresponding to the left eye and the right eye during training can also have independent benchmark results. The loss function can use a binary cross-entropy loss function, etc.

After the training is completed, when using the open and closed eye recognition model, the open and closed eye recognition model is set in the electronic device. The electronic device inputs the face image into the open and closed eye recognition model, and the output of open or closed eyes can be obtained.

Human gaze recognition model: During training, a face data set marked with gaze or non-gaze can be constructed, and the face data set is input into the model to be trained. The model to be trained can first identify people based on the face key point detection network. Key points of the face are then cut out based on the coordinates of these key points to produce small patches of the left eye image and small patches of the right eye image. The confidence of the predicted gaze or non-gaze is further output based on the small patches of the left eye image and the small patch of the right eye image. When the trained model prediction value and the corresponding annotation reference value converge based on the numerical value calculated based on the loss function, the trained human eye gaze recognition model can be obtained.

Among them, the model to be trained can be a convolutional neural network model, and the model can adopt a convolutional neural network structure, including a convolutional layer and a fully connected layer. The loss function can use a binary cross-entropy loss function, etc.

After the training is completed, when using the human eye gaze recognition model, the human eye gaze recognition model is set in the electronic device. The electronic device inputs the face image into the human eye gaze recognition model, and the output of gaze or non-gaze can be obtained.

Face anti-counterfeiting model: It can include a two-dimensional anti-counterfeiting model (also called a living body detection model) or a three-dimensional anti-counterfeiting model, which is used to determine whether a face is a real face or a fake face. The model structure of the face anti-counterfeiting model can include a classification model based on a convolutional neural network. The IR image and/or depth map is used as the input of the face anti-counterfeiting model. The face anti-counterfeiting model can determine whether it is a real face or a fake face. The network loss function of the face anti-counterfeiting model can also use the binary cross-entropy loss function.

4) Other terms

In the embodiments of the present application, words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects. For example, the first chip and the second chip are only used to distinguish different chips, and their sequence is not limited. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and execution order, and words such as "first" and "second" do not limit the number and execution order.

It should be noted that in the embodiments of this application, words such as "exemplary" or "for example" are used to represent examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "such as" is not intended to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the words "exemplary" or "such as" is intended to present the concept in a concrete manner.

In the embodiments of this application, "at least one" refers to one or more, and "multiple" refers to two or more. "And/or" describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the related objects are in an "or" relationship. "At least one of the following" or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items). For example, at least one of a, b, or c can mean: a, b, c, a-b, a--c, b-c, or a-b-c, where a, b, c can be single, or It's multiple.

Face recognition or gesture recognition can be applied in fields such as terminal unlocking, security or electronic payment, so that when users use terminal devices, they can realize convenient unlocking, permission passing or payment based on face or gestures, and it can also improve user usage. Privacy and security on terminal devices.

Whether it is face recognition or gesture recognition, the terminal device needs to use a camera to capture images and achieve corresponding recognition based on the images. The image may be a two-dimensional color image, an IR image, a three-dimensional image, etc.

Taking face recognition applied to face unlocking of terminal devices as an example, Figure 1 shows a schematic diagram of a face unlocking scenario. As shown in a of Figure 1, when the user faces the terminal device, the terminal device can capture the image based on the TOF camera, and further the terminal device determines whether the unlocking conditions are currently met based on the face unlock related model. For example, if the If the face matches the preset face, the terminal device can determine that the unlocking conditions are met, realize unlocking, and enter the main interface shown in b of Figure 1.

If the camera of the terminal device is blocked by the display screen, the image captured by the camera will suffer from serious quality degradation. The factors of quality degradation include, for example, some or more of the following: image blur, Newton rings, diffraction spots, and reduced brightness. , the gray value becomes lower, etc. Low-quality images will lose some feature information, and inputting low-quality images into face unlocking-related models will affect the accuracy of subsequent face-unlocking-related models, thereby reducing the success rate of face unlocking.

Therefore, in one possible implementation, the display screen in the terminal device leaves a small hole for the camera to avoid the display screen from blocking the camera, so as to obtain better quality images through the camera. However, the way of opening small holes in the display will limit the flexibility of the phone's shape design and destroy the integrity of the display.

In order to maintain the integrity of the display screen, in another possible implementation, the display screen in the terminal device can cover the camera. After the terminal device uses the camera covered under the display screen to capture low-quality images, it performs quality improvement processing on the low-quality images. , and then input the improved quality image into the face unlocking related model.

However, in this implementation, there are situations where the image after quality improvement processing cannot be adapted to the face unlock related model, or it is understood that the recognition effect of the processed image with improved quality in the face unlock related model is different from that without a screen. There is still a big gap in the occlusion situation, resulting in the unlocking rate of face unlocking still being low.

In view of this, embodiments of the present application provide an image processing method based on off-screen images. When training an AI preprocessing model for improving image quality, this method can be trained in conjunction with the image processing model, so that the AI preprocessing model processes Images can get better processing results in the image processing model.

The image processing model may be any image processing related model. For example, the image processing model may include a gesture recognition model, an expression recognition model, a face unlocking related model or a face payment related model, etc., which are not specifically limited in the embodiments of this application. It can be understood that, for the convenience of description, the embodiment of the present application will be described later by taking the image processing model as a face unlocking-related model as an example. This example does not specifically limit the image processing model.

When the image processing model is a face unlock-related model, the AI preprocessing model can be trained in conjunction with the face unlock-related model, so that the image processed by the AI preprocessing model can obtain better recognition accuracy in the face unlock-related model. After the terminal device obtains the off-screen image, it can input the off-screen image into the AI preprocessing model. The AI preprocessing model improves the quality of the off-screen image, and then inputs the improved quality image into the face unlocking related model to achieve face unlocking. Accuracy.

It can be understood that because the AI preprocessing model in the embodiment of the present application is combined with the face unlocking related model during training, the image quality output by the AI preprocessing model can meet the needs of the face unlocking related model, and it is easy to obtain better results. If the recognition effect is good, the method of the embodiment of the present application can improve the unlocking success rate when used in face unlocking scenarios.

It should be noted that the embodiments of the present application may include a training phase of the AI preprocessing model and a usage phase of the AI preprocessing model. The training phase can be implemented by an electronic device with strong computing power. The specific training process will be described in detail in subsequent embodiments and will not be described in detail here. In the use stage, the AI preprocessing model can be deployed in terminal devices that need to use the AI preprocessing model to process off-screen images through the AI preprocessing model to achieve better image recognition effects.

The terminal device in the embodiment of the present application may also be any form of electronic device. For example, the electronic device may include a handheld device with an image processing function, a vehicle-mounted device, etc. For example, some electronic devices are: mobile phones, tablets, PDAs, laptops, mobile internet devices (MID), wearable devices, virtual reality (VR) devices, augmented reality ( augmented reality (AR) equipment, wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in remote medical surgery, and wireless terminals in smart grids Wireless terminals, wireless terminals in transportation safety, wireless terminals in smart cities, wireless terminals in smart homes, cellular phones, cordless phones, session initiation protocols, SIP) telephone, wireless local loop (WLL) station, personal digital assistant (PDA), handheld device with wireless communication capabilities, computing device or other processing device connected to a wireless modem, vehicle-mounted device , wearable devices, terminal devices in the 5G network or terminal devices in the future evolved public land mobile communication network (public land mobile network, PLMN), etc., the embodiments of the present application are not limited to this.

As an example and not a limitation, in the embodiment of the present application, the electronic device may also be a wearable device. Wearable devices can also be called wearable smart devices. It is a general term for applying wearable technology to intelligently design daily wear and develop wearable devices, such as glasses, gloves, watches, clothing and shoes, etc. A wearable device is a portable device that is worn directly on the body or integrated into the user's clothing or accessories. Wearable devices are not just hardware devices, but also achieve powerful functions through software support, data interaction, and cloud interaction. Broadly defined wearable smart devices include full-featured, large-sized devices that can achieve complete or partial functions without relying on smartphones, such as smart watches or smart glasses, and those that only focus on a certain type of application function and need to cooperate with other devices such as smartphones. Use, such as various types of smart bracelets, smart jewelry, etc. for physical sign monitoring.

In addition, in the embodiment of this application, the electronic device can also be a terminal device in the Internet of things (IoT) system. IoT is an important part of the future development of information technology. Its main technical feature is to transfer items through communication technology. Connect with the network to realize an intelligent network of human-computer interconnection and physical-object interconnection.

The electronic equipment in the embodiments of this application may also be called: terminal equipment, user equipment (UE), mobile station (MS), mobile terminal (mobile terminal, MT), access terminal, user unit, User station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication equipment, user agent or user device, etc.

In this embodiment of the present application, the electronic device or each network device includes a hardware layer, an operating system layer running on the hardware layer, and an application layer running on the operating system layer. This hardware layer includes hardware such as central processing unit (CPU), memory management unit (MMU) and memory (also called main memory). The operating system can be any one or more computer operating systems that implement business processing through processes, such as Linux operating system, Unix operating system, Android operating system, iOS operating system or windows operating system, etc. This application layer includes applications such as browsers, address books, word processing software, and instant messaging software.

For example, FIG. 2 shows a schematic structural diagram of the electronic device 100.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors.

The controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.

The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver and transmitter (universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and /or universal serial bus (USB) interface, etc.

It can be understood that the interface connection relationships between the modules illustrated in the embodiment of the present invention are only schematic illustrations and do not constitute a structural limitation of the electronic device 100 . In other embodiments of the present application, the electronic device 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations and are used for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, videos, etc. Display 194 includes a display panel. The display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode). emitting diode (AMOLED), flexible light-emitting diode (FLED), Mini LED, Micro LED, Micro-o LED, quantum dot light-emitting diode (QLED), etc. In some embodiments, the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.

The electronic device 100 can implement the shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when taking a photo, the shutter is opened, the light is transmitted to the camera sensor through the lens, the light signal is converted into an electrical signal, and the camera sensor passes the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye. ISP can also perform algorithm optimization on image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.

Camera 193 is used to capture still images or video. The object passes through the lens to produce an optical image that is projected onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other format image signals. In some embodiments, the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1. The multiple cameras 193 may be of different types. For example, the camera 193 may include a camera for acquiring color images or a TOF camera.

Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.

Video codecs are used to compress or decompress digital video. Electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

NPU is a neural network (NN) computing processor. By drawing on the structure of biological neural networks, such as the transmission mode between neurons in the human brain, it can quickly process input information and can continuously learn by itself. Intelligent cognitive applications of the electronic device 100 can be implemented through the NPU, such as image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.

Internal memory 121 may be used to store computer executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. Among them, the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.). The storage data area may store data created during use of the electronic device 100 (such as audio data, phone book, etc.). In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The software system of the electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture, etc. The embodiment of this application takes the Android system with a layered architecture as an example to illustrate the software structure of the electronic device 100 .

FIG. 3 is a software structure block diagram of the electronic device 100 according to the embodiment of the present application.

The layered architecture divides the software into several layers, and each layer has clear roles and division of labor. The layers communicate through software interfaces. In some embodiments, the Android system may include: application layer (applications), application framework layer (application framework), hardware abstract layer (HAL), and kernel layer (kernel), where the kernel layer may Become the driver layer.

The application layer can include a series of application packages.

As shown in Figure 3, the application package can include camera, gallery, phone, map, phone, music, settings, mailbox, video, social and other applications. Optionally, the application package may also include an application program for image recognition, and the image recognition application program includes an algorithm or model for image recognition, etc. It can be understood that the image recognition application can exist alone, or can be a part of any application in the application layer, which is not specifically limited in the embodiment of the present application.

The application framework layer provides an application programming interface (API) and programming framework for applications in the application layer. The application framework layer includes some predefined functions.

As shown in Figure 3, the application framework layer can include a window manager, content provider, resource manager, view system, notification manager, camera access interface, etc.

A window manager is used to manage window programs. The window manager can obtain the display size, determine whether there is a status bar, lock the screen, touch the screen, drag the screen, capture the screen, etc.

Content providers are used to store and retrieve data and make this data accessible to applications. Data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls, such as controls that display text, controls that display pictures, etc. A view system can be used to build applications. The display interface can be composed of one or more views. For example, a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.

The resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files, etc.

The notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc. The notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a prompt sound is emitted, the terminal device vibrates, and the indicator light flashes, etc.

The camera access interface allows applications to manage cameras and access camera devices. For example, manage the camera for image shooting, etc.

The hardware abstraction layer can include multiple library modules. For example, the library modules can be camera library modules, algorithm library modules, etc. The Android system can load the corresponding library module for the device hardware, thereby achieving the purpose of the application framework layer accessing the device hardware. In the embodiment of this application, the algorithm library may include an AI preprocessing model for processing images, any face unlocking related model for realizing face unlocking, etc.

The kernel layer is the layer between hardware and software. The kernel layer is used to drive the hardware and make the hardware work. The kernel layer may include camera device drivers, display drivers, audio drivers, etc., which are not limited in the embodiments of this application. The hardware layer can include various types of sensors. Photography sensors include, for example, TOF cameras, multispectral sensors, etc.

For example, a camera device driver can drive a camera sensor in the hardware layer to capture images, etc.

The possible implementation of the image processing method according to the embodiment of the present application will be described below with reference to FIG. 3 .

In a possible implementation manner, the relevant algorithm model of the embodiment of the present application is set in the algorithm library of the hardware abstraction layer. For example, when performing face unlocking, the camera access interface can be called through the camera application. The camera access interface manages the camera hardware abstraction layer to obtain images through the camera driver. The obtained images are further implemented in the algorithm library of the hardware abstraction layer through this application. After calculating the AI preprocessing model and algorithms such as face unlocking, the terminal device unlocking and other processes are executed.

In another possible implementation manner, the relevant algorithm model of the embodiment of the present application is set in an image processing application at the application layer. For example, when making face payment, the camera access interface can be called through the image processing application. The camera access interface manages the camera hardware abstraction layer to achieve image acquisition through the camera driver. The acquired image is further processed in the algorithm library of the application layer through this application. After the AI preprocessing model of the embodiment and algorithms such as face unlocking are calculated, processes such as payment are executed.

The image processing method based on the off-screen image according to the embodiment of the present application will be described in detail below through specific embodiments. The following embodiments may be combined with each other or implemented independently, and the same or similar concepts or processes may not be described again in some embodiments.

When executing the image processing method based on off-screen images in the embodiment of the present application, it is necessary to pre-train the AI preprocessing model based on joint training to improve the quality of off-screen images. Among them, the joint training method can be understood as: when training the AI preprocessing model, the relevant models that will be used together with the AI preprocessing model to achieve functions such as face unlocking are trained together to obtain a model that can be adapted to the face unlocking related models. AI preprocessing model, then when combined with the AI preprocessing model to implement processes such as face unlocking, it will help face unlocking related models to obtain more accurate recognition.

In the embodiments of this application, there are two ways to implement joint training. The following is an example of two ways to implement AI preprocessing model training using a joint training method with reference to Figures 4 and 5.

As shown in Figure 4, in the first embodiment of joint training of AI preprocessing models, an AI preprocessing model for using poor quality images to output better quality can be first trained, and then, the output of the AI preprocessing model can be used Construct a data set for training the model used in conjunction with the AI preprocessing model. After using the data set related to the AI preprocessing model to train the model used in conjunction with the AI preprocessing model, the output of the AI preprocessing model can be compared. It can better meet the needs of subsequent models that use this output, so subsequent models can get better processing results. For example, when an AI preprocessing model is combined with a face unlock-related model to implement functions such as face unlock, the face unlock-related model can achieve better output accuracy. As shown in Figure 4, methods include:

S401: Obtain the first training data set.

In this embodiment of the present application, the first training data set may be a data set used to train the AI preprocessing model. The first training data set may include first test data and first sample data. For example, the first test data may be a lower quality image, and the first sample data may be a lower quality image corresponding to the lower quality image. Good images. The images in the first test data can have a one-to-one correspondence with the images in the first sample data, so that when subsequently training the AI preprocessing model, the images in the first sample data can be used as the model to be trained to train a certain order. A benchmark image for test data. The image with poorer quality may correspond to the test image of the first test data, and the image with better quality may correspond to the sample image of the first sample data.

In a possible implementation, the first test data may be an under-screen IR image obtained with a TOF camera. The first sample data may be a screen-free IR image obtained with a TOF camera, or may be understood as an IR image collected when the TOF camera is not blocked by the screen.

In another possible implementation, the first sample data may be a screenless IR image obtained with a TOF camera, and the first test data may be an image obtained by synthesizing the first sample data, for example, the first sample data The screenless IR image in the image is subjected to degradation processing such as adding screen occlusion effects, and the first test data with poor quality is obtained. Increasing the screen occlusion effect may include, for example, one or more of the following: increasing Newton rings, increasing diffraction spots, reducing grayscale values, increasing image blur effects, etc.

In order to cover different situations as much as possible, so that the subsequently trained AI preprocessing model can better process images in different shooting scenes, the first training data set can include different shooting distances (such as 30cm, 40cm, 50cm), The first test data and the first sample data corresponding to the shooting angle, exposure time (such as 1000us, 1500us), camera type and/or different shooting objects. When the photographed subject is a subject wearing glasses, the first training data set may also include images of the subject wearing different types of glasses.

S402: Train the model to be trained based on the first training data set to obtain the AI preprocessing model.

A model to be trained can be set in the electronic device. The model to be trained can be any type of neural network model. For example, it can include any of the following types of models: convolutional neural network (CNN), generative adversarial network (generative adversarial) network, GAN), U-shaped convolutional neural network (U-net), transformer module, etc. Among them, U-net can include encoders and decoders. For example, the encoder can include 3-5 convolutional layers, in which the activation function can be leaky relu, without a normalization layer; the decoder can use an upsampling layer, and the number of decoders can be 1 less than the encoder, with the same activation function. You can choose leaky relu, no normalization layer; feature fusion can be performed between the encoder and decoder.

The initialization parameters of the model to be trained are not specifically limited in the embodiments of this application. For example, the kaiming initialization method can be used to initialize the parameters in the model to be trained. For example, the hyperparameter settings include: batchsize is 32 during model training, learning rate is set to 0.0002, Adam is selected as the optimizer, Adam's parameters β ₁ =0.9, β ₂ =0.99, and iteration epoch is 200.

During training, you can input the test data into the model to be trained, calculate the loss between the predicted image output by the model to be trained and the corresponding benchmark image in the sample data, and then update the parameters of the model to be trained based on the results of the loss calculation, and repeat this process until the predetermined maximum number of iterations is reached, or the loss converges, or the loss value is less than a certain value, the model training can be considered to be over, and the AI preprocessing model is obtained.

For example, when the AI preprocessing model improves the image quality of the off-screen IR image, the effects it can achieve include but are not limited to one or more of the following: making blurry images clear, removing Newton rings in the image, Eliminate diffraction spots in images and add grayscale values to images taken at long distances.

After obtaining the trained AI preprocessing model, the AI preprocessing model can be jointly trained with the face unlocking-related model to be trained to obtain a face unlocking-related model that can be adapted to the AI preprocessing model. For details, please refer to the records of S403 and S404.

S403: Obtain the second training data set.

In this embodiment of the present application, the second training data set may be a data set used to train a face unlocking related model. It is understandable that the face unlocking related models can be flexibly selected based on specific needs, and the number of face unlocking related models can be one or more.

When the number of face unlocking related models is multiple, the training steps of S403 and S404 can be performed separately for each model. For example, as shown in Figure 4, when the face unlocking related models include a face recognition model, an eye opening and closing model, a human eye gaze model and a face anti-counterfeiting model, the face recognition data set, eyes opening and closing can be constructed respectively. Data set, human eye gaze data set and face anti-counterfeiting data set, input each data set into the AI preprocessing model respectively, and combine the output of the AI preprocessing model to obtain each face unlocking related model.

S404: Input the second training data set into the AI preprocessing model, and facilitate the output training of the AI preprocessing model to obtain a face unlocking related model.

In the embodiment of this application, the second data set used for training the face unlocking related model is first input into the AI preprocessing model, and then the image processed by the AI preprocessing model is used as the input for training the face unlocking related model, then the training The obtained face unlocking related model can better recognize the images processed by the AI preprocessing model. In this way, in the scenario of face unlocking, the terminal device can be supported to obtain a better unlocking rate based on the under-screen image.

It should be noted that, unlike the training method in the terminology introduction section, the training data set of the model in the training method of the embodiment of the present application is an image processed by the AI preprocessing model. The specific training principle is the training method and terminology of the embodiment of the present application. The introduction part is similar and will not be repeated here.

It can be understood that in the embodiments of the present application, because the AI preprocessing model is first trained, the AI preprocessing model only requires a small amount of training data in its own training phase, which can improve the problem that data construction takes a long time due to a large number of image acquisitions. . And subsequent face unlocking related models are trained separately, so the training difficulty is small.

In the second embodiment of jointly training an AI preprocessing model as shown in Figure 5, when training an AI preprocessing model that uses poor quality images to output better quality, the subsequent needs can be combined with the AI preprocessing model. If the model is used together, the output of the model that needs to be used together with the AI preprocessing model is used as a feedback factor for training the AI preprocessing model. Then the output of the trained AI preprocessing model can better meet the requirements of the model that subsequently uses the output. demand, so subsequent models can achieve better processing results. For example, when an AI preprocessing model is combined with a face unlock-related model to implement functions such as face unlock, the face unlock-related model can achieve better output accuracy. As shown in Figure 5, methods include:

S501: Obtain the third training data set.

In the embodiment of this application, the third training data set may be a data set used to train the AI preprocessing model. The content in the third training data set may refer to the relevant expressions of the first training data set, which will not be described again.

S502: Obtain the preset model that will be jointly trained with the AI preprocessing model.

In the embodiment of the present application, the preset model may be a trained model that will be used in conjunction with the AI preprocessing model. The number of the preset models may be one or more. The embodiment of the present application does not specifically limit the preset model.

For the convenience of explanation, in the embodiment of this application, the preset models including: face recognition model, human eye gaze recognition model, open and closed eye recognition model and face anti-counterfeiting model will be used as examples for illustration. It should be understood that the preset model may also be any one or more of the above models.

It should be noted that the above-mentioned face unlocking related models mentioned in the embodiments of this application may be conventional models trained based on screenless images. Or it can be understood that the embodiments of the present application can reuse conventional existing models, and there is no need to adjust the parameters of the preset model when jointly training or using the AI preprocessing model. In this way, the number of models that need to be trained can be reduced.

S503: Based on the third training data set, the model to be trained, the preset model and the target loss function, train the AI preprocessing model.

In the embodiment of this application, the third training data set can be used as the input of the model to be trained, and the output of the model to be trained can be used as the input of the preset model. The output of the preset model can be used as the target loss function, and the target loss function can be adjusted according to the target loss function. The parameters of the model to be trained are used until the target loss function converges, or the target loss function is less than a certain value, or the training reaches the maximum number of training times, and the AI preprocessing model is obtained. Among them, during the training process, the model parameters of the preset model do not need to be adjusted during training.

For example, as shown in Figure 6, taking the preset models including: face recognition model, human eye gaze recognition model, open and closed eye recognition model and face anti-counterfeiting model as an example, the loss function definition of AI preprocessing model C _θ is L _c , the loss function of the face recognition model F _θ is defined as L _F , the loss function of the human eye gaze recognition model G _θ is defined as L _G , the loss function of the open and closed eye recognition model E _θ is defined as L _E , and the face The loss function of the anti-counterfeiting model R _θ is defined as L _R .

The target loss function L _total can be related to L _c , _LF , _LG , _LE and _LR .

For example, take the batch size (batch-size) during training of the third data set as 1, the IR image as the test data as x, and the benchmark image corresponding to x as y.

The output prediction value of the AI preprocessing model C _θ is: y′=C _θ (x).

Lc can be the loss value obtained between y' and y, which can reflect the difference in pixel points between y' and y. Lc can be any of the following types of loss functions: L1 loss function, L2 loss function, smoothL1 loss function, etc. For example, L _c can be abs(y′-y), or (y′-y) ² , or the following piecewise function:

0.5(y′-y), |y′-y|<1

|y′-y|-0.5,(y′-y)＜-1or(y′-y)＞1

For the face recognition model F _θ , y′ is the input of F _θ , and y is the reference image of y′. One-dimensional vector can be obtained by passing y′ and y through F _θ .

L _F can be used to reflect the similarity between F _θ (y′) and F _θ (y). When calculating similarity, any of the following algorithms can be used: cosine similarity, Euclidean distance, Manhattan distance, etc. Taking cosine similarity calculation as an example, L _F can satisfy the following formula:

L _F =1-cos_sim(F _θ (y′),F _θ (y))

Because y′=C _θ (x), the L _F formula can be transformed into:

L _F =1-cos_sim(F _θ (C _θ (x)),F _θ (y))

The human eye gaze recognition model G _θ , the open and closed eye recognition model E _θ and the face anti-counterfeiting model R _θ can all be binary classification models, such as CNN, transformer, multi-layer perceptron (MLP), wait.

The input of the human gaze recognition model G _θ , the open and closed eye recognition model E _θ and the face anti-counterfeiting model R _θ can all be y′, the output can be the result of binary classification, and the loss function can all use binary cross entropy ( binary cross entropy, BCE) calculation. For example, _LG , _LE and _LR can respectively satisfy the following formulas:

L _G = BCE (G _θ (y′), G _θ (y)) = BCE (G _θ (C _θ (x)), G _θ (y))

L _E =BCE(E _θ (y′),E _θ (y))=BCE(E _θ (C _θ (x)),E _θ (y))

L _R = BCE (R _θ (y′), R _θ (y)) = BCE (R _θ (C _θ (x)), R _θ (y))

The target loss function L _total is related to L _c , _LF , _LG , _LE and _LR . For example, L _total can satisfy the following formula:

L _total =αL _c +βL _F +γL _G +θL _E +τL _R

Among them, α, β, γ, θ, and τ can be preset constants.

For example, in a possible implementation, α, β, γ, θ, and τ can be set so that L _c , _LF , _LG , _LE and _LR have similar magnitudes of influence on L _total . For example, αL _c , βL _F , _γLG , _θLE and τL _R can all be in a certain data interval, or the value between any two should not exceed 10 times, etc. This can make each face unlocking related model in The weights occupied when training the AI preprocessing model are similar, which allows the AI preprocessing model to be better used in conjunction with various face unlocking related models.

In another possible implementation, α+β+γ+θ+τ=1 can also be set, or it can be basically satisfied during a single iteration of model training: αL _c is approximately equal to βL _F is approximately equal to γL _G is approximately equal _θLE is approximately equal to τL _R , etc., and is not specifically limited in the embodiments of this application.

It should be noted that in the embodiment of the present application, one or more of the face recognition model F _θ , human eye gaze recognition model G _θ , open and closed eye recognition model E _θ and face anti-counterfeiting model R _θ can be omitted according to the specific scenario. When training the AI preprocessing model, the omitted model-related data can be removed. The training method is similar and will not be described in detail here. The types of neural network models that may be used in the AI preprocessing model of the embodiments of this application, as well as the specific improvement effects that can be achieved on images, can be referred to the description of the corresponding embodiment in Figure 4, and will not be described again here.

The embodiments of the present application can reuse existing models when jointly training AI preprocessing models, thus reducing the types of training models.

When the relevant model is obtained by training in the manner of Figure 4 or Figure 5, the trained model can be deployed in a terminal device that needs to use the relevant model, and the terminal device can use the relevant model to implement corresponding functions.

Taking face unlocking on a terminal device as an example, the method of the embodiment of the present application is explained. When implementing face unlocking, one or more of the AI preprocessing model, face recognition model, open and closed eye recognition model, human eye gaze recognition model, and face anti-counterfeiting model can be used.

Among them, the face recognition model can be used to identify whether the image processed by the AI preprocessing model is consistent with the pre-stored face used to unlock the terminal device. If they are consistent, it can be determined that the face in the image under the screen has unlocking authority. If they are inconsistent, it can be determined that the face in the image under the screen has unlocking authority. It may be that someone without permission has the intention to invade the terminal device and can terminate the unlocking.

The open and closed eyes recognition model can be used to identify whether the eyes of the person in the image processed by the AI preprocessing model are open. If the eyes are open, it can confirm that the user has the intention to unlock. If the eyes are closed, it may be that someone else is sleeping or in other closed-eyes scenes. If the user's face is used to try to unlock the terminal, the unlocking can be terminated.

The human eye gaze recognition model can be used to identify whether the human eyes of the image processed by the AI preprocessing model are staring at the screen of the terminal device. Gazing can confirm that the user has the intention to unlock, while non-gazing may mean that the user is doing other things in front of the terminal device. The activity has no intention of unlocking and can terminate the unlocking.

The face anti-counterfeiting model can be used to identify whether the image processed by the AI preprocessing model is based on a real person. A real person can confirm that the user has the intention to unlock, while a non-real person may use the user's photo or model to try to intrude. The terminal device can terminate the unlocking.

It is understandable that when implementing face unlocking, the more face unlocking related models are used, the higher the security and the quality of user experience during face unlocking may be. The fewer the number of face unlocking related models used, the lower the computational load during face unlocking may be, which will help reduce the power consumption of the terminal device. In specific use, the terminal device may use one or more models related to face unlocking based on user-defined settings or default settings, which are not specifically limited in the embodiments of this application.

As an example, Figure 7 shows a schematic flow chart of face unlocking. As shown in Figure 7, methods may include:

S701: The terminal device obtains the off-screen image.

For example, when the screen is locked, the terminal device can use the TOF camera under the screen to capture a raw image, and further analyze the raw image to obtain an IR under-screen image.

S702: The terminal device inputs the off-screen image into the AI preprocessing model to obtain the processed image.

The AI preprocessing model in the embodiment of the present application can be the AI preprocessing model in the embodiment corresponding to Figure 4, and the face unlocking related model in subsequent S703 can be the face unlocking related model trained in the embodiment corresponding to Figure 4 .

The AI preprocessing model in the embodiment of this application can also be the AI preprocessing model in the embodiment corresponding to Figure 5, and the face unlocking related model in subsequent S703 can be any model trained based on screenless images.

The terminal device inputs the off-screen image into the AI preprocessing model to obtain a processed image with better quality.

S703: The terminal device uses the processed image as the input of the face unlocking related model, and executes the unlocking process based on the face unlocking related model.

For example, the terminal device can use the processed images as inputs for a face recognition model, an eye opening and closing recognition model, a human eye gaze recognition model, and a face anti-counterfeiting model respectively.

If any of the following situations occurs, the terminal device can exit the unlocking process and the unlocking fails: the face recognition model recognizes that the face in the processed image does not match the preset face, the open and closed eye recognition model recognizes the person in the processed image The eyes are closed, the human eye gaze recognition model recognizes that the human eyes in the processed image are not gazing, and the face anti-counterfeiting model recognizes that the person in the processed image is not a real person.

If the face recognition model recognizes that the face in the processed image matches the preset face, the open and closed eye recognition model recognizes that the human eyes in the processed image are open, and the human eye gaze recognition model recognizes that the human eyes in the processed image are In order for the gaze and face anti-counterfeiting model to recognize the real person in the processed image, the terminal device can realize user-free unlocking based on the face.

It can be understood that the face recognition model, the eye opening and closing recognition model, the human eye gaze recognition model and the face anti-counterfeiting model can be recognized at the same time, or they can be sequenced in any form, and the recognition can be performed in sequence according to the sequence. Embodiments of the present application No specific limitation is made.

In the embodiment of this application, when the terminal device implements face unlocking, the collected images are processed by an AI preprocessing model jointly trained with the face unlocking related model to obtain an image that can be accurately recognized by the face unlocking related model. Therefore, the accuracy and success rate of face unlocking can be improved.

Optionally, the terminal device can select which face unlocking related models to use to perform face unlocking based on user settings.

For example, FIG. 8 shows a possible interface diagram for a user to set a face unlocking related model. As shown in Figure 8, multiple face unlock modes can be displayed in the interface. Among them, the interface shown in Figure 8 may correspond to the first interface, and the identification of the target face unlock mode may correspond to one of the standard mode, mask mode, strict mode or custom mode. When the terminal device receives multiple When the target control corresponding to the logo of the target face unlock mode is triggered in the face unlock mode, the image processing model is set to the model corresponding to the target face unlock mode.

For example, the standard mode can include face ID recognition and face authenticity. When the user selects the standard mode, the terminal device can use the AI preprocessing model combined with the face recognition model and the face anti-counterfeiting model to execute the face unlocking process.

The mask mode can include eye opening and closing recognition, human gaze recognition, and face authenticity. When the user selects the standard mode, the terminal device can use the AI preprocessing model to combine the eye opening and closing recognition model, the human eye gaze recognition model, and the face. Anti-counterfeiting model to execute the face unlocking process. In this implementation, it is convenient for users to achieve non-perceptual face unlocking when wearing a mask.

Strict mode can include face ID recognition, open and closed eye recognition, human eye gaze recognition and face authenticity. When the user selects strict mode, the terminal device can use the AI preprocessing model combined with the face recognition model, open and closed eye recognition model, human eye gaze recognition model, and face anti-counterfeiting model to execute the face unlocking process. In this implementation, face unlocking privacy security and user experience can be better improved.

In the custom mode, the user can select one or more of face ID recognition, open and closed eye recognition, human eye gaze recognition or face authenticity, and the terminal device can select the corresponding model based on the user's selection to perform facial recognition. Unlock process.

It can be understood that the user interface in Figure 8 is only an exemplary illustration, and the various face unlock modes are also only an exemplary illustration. In specific applications, the names and corresponding functions of each mode can be modified according to the needs. The mode can also be adapted to be deleted or added, which is not specifically limited in the embodiment of this application.

It should be noted that if the terminal device uses the AI preprocessing model and the face unlocking related model trained in the embodiment of Figure 4 to perform face unlocking, because in the embodiment of Figure 4 each face unlocking related model is independent, therefore The terminal device can independently store each face unlocking related model trained by the method in the embodiment of Figure 4, and after determining the model that needs to be used, each model can be called separately.

If the terminal device uses the AI preprocessing model and face unlocking related models trained in the embodiment of Figure 5 to perform face unlocking, because in the embodiment of Figure 5, the AI preprocessing model is combined with various face unlocking related models during training, Therefore, any unlocking mode in Figure 5 corresponds to a set of AI preprocessing models and corresponding face unlocking related models. After determining the unlocking mode to be used, the terminal device can call a set of AI preprocessing models and corresponding face unlocking models corresponding to the unlocking mode. Corresponding face unlock related models, execute the corresponding unlocking process.

It can be understood that the embodiment of the present application takes face unlocking performed by a terminal device as an example for explanation. The method of the embodiment of the present application can also be applied to scenarios such as face payment, and will not be described again here.

The above mainly introduces the solutions provided by the embodiments of the present application from the perspective of methods. In order to realize the above functions, it includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that the method steps of each example described in conjunction with the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.

Embodiments of the present application can divide the device that implements the image processing method based on off-screen images into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into in a processing module. Integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.

Figure 9 shows a schematic structural diagram of a chip provided by an embodiment of the present application. The chip 90 includes one or more (including two) processors 901, a communication line 902, a communication interface 903 and a memory 904.

In some embodiments, memory 904 stores the following elements: executable modules or data structures, or subsets thereof, or extensions thereof.

The method described in the above embodiment of the present application can be applied to the processor 901 or implemented by the processor 901. The processor 901 may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 901 . The above-mentioned processor 1201 can be a general-purpose processor (for example, a microprocessor or a conventional processor), a digital signal processor (digital signal processing, DSP), an application specific integrated circuit (ASIC), or an off-the-shelf programmable gate. Array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gates, transistor logic devices or discrete hardware components, the processor 901 can implement or execute the various processing-related methods and steps disclosed in the embodiments of the present application. and logic block diagram.

The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. Among them, the software module can be located in a storage medium mature in this field such as random access memory, read-only memory, programmable read-only memory or electrically erasable programmable read only memory (EEPROM). The storage medium is located in the memory 904. The processor 901 reads the information in the memory 904 and completes the steps of the above method in combination with its hardware.

The processor 901, the memory 904 and the communication interface 903 can communicate with each other through the communication line 902.

In the above embodiments, the instructions stored in the memory for execution by the processor may be implemented in the form of a computer program product. The computer program product may be written in the memory in advance, or may be downloaded and installed in the memory in the form of software.

An embodiment of the present application also provides a computer program product including one or more computer instructions. When computer program instructions are loaded and executed on a computer, processes or functions according to embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. Computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, e.g., computer instructions may be transmitted from a website, computer, server or data center via a wired link (e.g. Coaxial cable, optical fiber, digital subscriber line (DSL) or wireless (such as infrared, wireless, microwave, etc.) means to transmit to another website site, computer, server or data center. The computer-readable storage medium can be Any available media that a computer can store or is a data storage device such as a server, data center, or other integrated server that includes one or more available media. For example, available media may include magnetic media (eg, floppy disks, hard disks, or tapes), optical media (eg, Digital versatile disc (digital versatile disc, DVD)), or semiconductor media (for example, solid state disk (solid state disk, SSD)), etc.

An embodiment of the present application also provides a computer-readable storage medium. The methods described in the above embodiments can be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Computer-readable media may include computer storage media and communication media and may include any medium that can transfer a computer program from one place to another. The storage media can be any target media that can be accessed by the computer.

As a possible design, the computer-readable medium may include compact disc read-only memory (CD-ROM), RAM, ROM, EEPROM or other optical disk storage; the computer-readable medium may include a magnetic disk memory or other disk storage device. Furthermore, any connection line is also properly termed a computer-readable medium. For example, if coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies (such as infrared, radio and microwave) are used to transmit the Software from a website, server or other remote source, then coaxial cable, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of medium. Disk and optical disk, as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and Blu-ray disc, where disks typically reproduce data magnetically, while discs reproduce data optically using lasers. Reproduce data.

Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present application. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine, such that the instructions executed by the processing unit of the computer or other programmable data processing device produce a A device for realizing the functions specified in one process or multiple processes of the flowchart and/or one block or multiple blocks of the block diagram.

The above specific embodiments further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the present invention shall be included in the protection scope of the present invention.

Claims

An image processing method based on off-screen images, characterized in that the method includes:

Get the off-screen image;

Input the off-screen image into the pre-trained artificial intelligence AI preprocessing model to obtain the processed image;

Input the processed image into the image processing model to obtain the processing result;

Wherein, when the image processing model is a model trained based on the image output by the AI preprocessing model, the AI preprocessing model is a model trained using a first training data set, and the first training data set It includes first test data and first sample data, the test image in the first test data corresponds to a sample image in the first sample data, and the image quality of the test image is worse than that of the sample image. Image quality; the image processing model is obtained by inputting the second training data set into the AI pre-processing model and training using the output of the AI pre-processing model. The second training data set includes components used to implement the Relevant data sets required to implement functions of the image processing model.
The method of claim 1, wherein the off-screen image is an image captured by a camera disposed under the screen.
The method of claim 2, wherein the camera includes a time-of-flight (TOF) camera.
The method according to any one of claims 1 to 3, characterized in that the image processing model includes one or more of the following: face recognition model, open and closed eye model, human eye gaze model or Face anti-counterfeiting model.
The method according to any one of claims 1-4, characterized in that the method further includes:

When the AI preprocessing model is a model trained in conjunction with the image processing model, when training the AI preprocessing model, the parameters of the image processing model are not adjustable, and the parameters of the AI preprocessing model are adjustable. , the AI preprocessing model completes training when the values calculated using the target loss function converge; wherein the target loss function is related to the loss function of the AI preprocessing model and the loss function of the image processing model.
The method according to claim 5, characterized in that there are multiple image processing models, and when calculating the target loss function, the loss function of the AI preprocessing model is consistent with any one of the image processing models. The weight difference between any two loss functions is less than the preset value.
The method according to claim 6, characterized in that when the image processing model includes a face recognition model, an eye opening and closing model, a human eye gaze model and a face anti-counterfeiting model, the target loss function satisfies the following formula :

L total =αL c +βL F +γL G +θL E +τL R

Wherein, the loss function of the AI preprocessing model is L c , the loss function of the face recognition model is LF , the loss function of the human eye gaze recognition model is LG , and the open and closed eye recognition model has The loss function is L E , the loss function of the face anti-counterfeiting model is L R , and α, β, γ, θ, and τ are all preset constants.
The method according to any one of claims 1 to 7, characterized in that the number and type of the image processing models are configurable; or, the test image is an image obtained by performing degradation processing on the sample data, The degradation processing includes one or more of the following: increasing Newton's rings, increasing diffraction spots, reducing grayscale values, or increasing image blur effects.
The method of claim 8, further comprising:

Display a first interface, the first interface includes identification of multiple face unlock modes, and each of the identifications corresponds to a control;

When receiving a trigger on the target control corresponding to the identification of the target face unlocking mode in the multiple face unlocking modes, the image processing model is set to the model corresponding to the target face unlocking mode.
The method according to claim 9, characterized in that the plurality of face unlocking modes include the following: standard mode, mask mode, strict mode or customized mode;

The image processing model corresponding to the standard mode includes a face recognition model and a face anti-counterfeiting model;

The image processing model corresponding to the mask mode includes an eye opening and closing recognition model, a human eye gaze recognition model and a face anti-counterfeiting model;

The image processing models corresponding to the strict mode include face recognition models, open and closed eye recognition models, human eye gaze recognition models and face anti-counterfeiting models;

The image processing model corresponding to the custom mode includes one or more of a face recognition model, an eye opening and closing recognition model, a human eye gaze recognition model, or a face anti-counterfeiting model.
An electronic device, characterized in that it includes: a memory and a processor, the memory is used to store a computer program, and the processor is used to execute the computer program to execute the method described in any one of claims 1-10. Image processing method based on off-screen images.
A computer-readable storage medium, characterized in that the computer-readable storage medium stores instructions that, when executed, cause the computer to execute the off-screen image-based method as described in any one of claims 1-10. image processing methods.