CN116798129A

CN116798129A - Living body detection method and device, storage medium and electronic equipment

Info

Publication number: CN116798129A
Application number: CN202310604708.7A
Authority: CN
Inventors: 曹佳炯
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2023-09-22

Abstract

The specification discloses a living body detection method, a living body detection device, a storage medium and electronic equipment, wherein the living body detection method comprises the following steps: determining a target scene property description text based on a first sample object detection image in a target scene, generating a plurality of second sample object detection images by combining the target scene property description text and the first sample object detection image through artificial intelligence image generation, creating an initial target living detection model for the target scene based on a reference living detection model, and performing model training on the initial target living detection model through the second sample object detection image to obtain a target living detection model.

Description

Living body detection method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a living body detection method, a living body detection device, a storage medium, and an electronic device.

Background

With the rapid development of computer technology, the biometric technology is widely applied to the production and life of people. For example, face-brushing payment, face access control, face attendance and face-entering technologies all need to rely on biological recognition, but with the wider application of biological recognition technologies, living body detection requirements under biological recognition scenes are also more and more protruded, and biological recognition scenes such as face attendance, face-brushing entering and face-brushing payment are widely applied, so that biological recognition provides convenience for people and brings new risk challenges. The most common means of threatening the security of biometric systems is a living attack, i.e. a technique that attempts to bypass image biometric authentication by means of a device screen, printing a photograph, etc., so that in biometric scenes, living detection is particularly important.

Disclosure of Invention

The specification provides a living body detection method, a living body detection device, a storage medium and electronic equipment, wherein the technical scheme is as follows:

in a first aspect, the present specification provides a method of in vivo detection, the method comprising:

acquiring first sample object detection images in a target scene, and determining a target scene property description text based on each first sample object detection image;

generating artificial intelligent images based on the target scene property description text and the first sample object detection images to obtain a plurality of second sample object detection images, wherein the number of samples of the second sample object detection images is greater than that of the first sample object detection images;

and creating an initial target living body detection model aiming at the target scene based on a reference living body detection model, and performing model training on the initial target living body detection model by adopting the second sample object detection image to obtain a target living body detection model, wherein the reference living body detection model is the living body detection model in the reference scene.

In a second aspect, the present specification provides a living body detection apparatus, the apparatus comprising:

the data processing module is used for acquiring first sample object detection images in the target scene and determining a target scene property description text based on each first sample object detection image;

The image generation module is used for carrying out artificial intelligent image generation on the basis of the target scene property description text and the first sample object detection images to obtain a plurality of second sample object detection images, and the number of samples of the second sample object detection images is larger than that of the first sample object detection images;

the model training module is used for creating an initial target living body detection model aiming at the target scene based on a reference living body detection model, carrying out model training on the initial target living body detection model by adopting the second sample object detection image to obtain a target living body detection model, wherein the reference living body detection model is a living body detection model under the reference scene.

In a third aspect, the present description provides a computer storage medium storing at least one instruction adapted to be loaded by a processor and to perform the method steps of one or more embodiments of the present description.

In a fourth aspect, the present description provides a computer program product storing at least one instruction adapted to be loaded by a processor and to perform the method steps of one or more embodiments of the present description.

In a fifth aspect, the present description provides an electronic device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of one or more embodiments of the present description.

The technical scheme provided by some embodiments of the present specification has the following beneficial effects:

in one or more embodiments of the present disclosure, a target scene property description text is determined based on a first sample object detection image under a target scene, and an artificial intelligence image is generated by combining the target scene property description text and the first sample object detection image to obtain a plurality of second sample object detection images.

Drawings

In order to more clearly illustrate the technical solutions of the present specification or the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the prior art descriptions, it is obvious that the drawings in the following description are only some embodiments of the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic view of a living body detection system provided in the present specification;

FIG. 2 is a schematic flow chart of a living body detection method provided in the present specification;

FIG. 3 is a schematic view of a scene of another in-vivo detection method provided in the present specification;

FIG. 4 is a flow diagram of one scenario property description generation provided herein;

FIG. 5 is a model training schematic of a scene property description generation model provided herein;

FIG. 6 is a model training schematic of the scene in-vivo sample generation model provided herein;

FIG. 7 is a schematic diagram of a training process of a target living body detection model provided in the present specification;

FIG. 8 is a schematic view of a living body detection apparatus provided in the present specification;

Fig. 9 is a schematic structural view of an electronic device provided in the present specification;

FIG. 10 is a schematic diagram of the architecture of the operating system and user space provided herein;

FIG. 11 is an architecture diagram of the android operating system of FIG. 10;

FIG. 12 is an architecture diagram of the IOS operating system of FIG. 10.

Detailed Description

The following description of the embodiments of the present invention will be made apparent from, and elucidated with reference to, the drawings of the present specification, in which embodiments described are only some, but not all, embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In the description of the present specification, it should be noted that, unless expressly specified and limited otherwise, "comprise" and "have" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

In the related art, in order to detect a living body attack, various living body detection methods are proposed, living body detection is usually performed by training a living body detection model aiming at some reference deployment scenes, but when the living body detection faces a new scene different from the reference deployment scenes, the living body detection model trained in the reference deployment scenes has the problem of adapting across scenes objectively, and the living body detection model trained in the reference deployment scenes needs to be retrained, sample images are acquired again and samples are labeled again in the new scenes, so that huge challenges still exist;

the present specification is described in detail below with reference to specific examples.

Please refer to fig. 1, which is a schematic diagram of a living body detection system provided in the present specification. As shown in fig. 1, the in-vivo detection system may include at least a client cluster and a service platform 100.

The client cluster may include at least one client, as shown in fig. 1, specifically including a client 1 corresponding to a user 1, a client 2 corresponding to a user 2, …, and a client n corresponding to a user n, where n is an integer greater than 0.

Each client in the client cluster may be a communication-enabled electronic device including, but not limited to: wearable devices, handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Electronic devices in different networks may be called different names, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, a personal digital assistant (personal digital assistant, PDA), an electronic device in a 5G network or future evolution network, and the like.

The service platform 100 may be a separate server device, such as: rack-mounted, blade, tower-type or cabinet-type server equipment or hardware equipment with stronger computing capacity such as workstations, mainframe computers and the like is adopted; the server cluster may also be a server cluster formed by a plurality of servers, and each server in the server cluster may be formed in a symmetrical manner, wherein each server is functionally equivalent and functionally equivalent in a transaction link, and each server may independently provide services to the outside, and the independent provision of services may be understood as no assistance of another server is needed.

In one or more embodiments of the present disclosure, the service platform 100 may establish a communication connection with at least one client in the client cluster, and complete interaction of data in a living body detection process, such as online transaction data interaction, based on the communication connection, for example, the client may collect a target object image of a target object in an environment where the client is located, send the target object image to the service platform 100, and execute a living body detection method corresponding to one or more embodiments of the present disclosure by the service platform 100 to perform a living body attack detection process; if the service platform 100 may not refer to the target living detection model as at least one client in the target scene, any client may perform living detection processing in the target scene based on the target living detection model;

It should be noted that, the service platform 100 establishes a communication connection with at least one client in the client cluster through a network for interactive communication, where the network may be a wireless network, or may be a wired network, where the wireless network includes, but is not limited to, a cellular network, a wireless local area network, an infrared network, or a bluetooth network, and the wired network includes, but is not limited to, an ethernet network, a universal serial bus (universal serial bus, USB), or a controller area network. In one or more embodiments of the specification, techniques and/or formats including HyperText Mark-up Language (HTML), extensible markup Language (Extensible Markup Language, XML), and the like are used to represent data exchanged over a network (e.g., target compression packages). All or some of the links may also be encrypted using conventional encryption techniques such as secure socket layer (Secure Socket Layer, SSL), transport layer security (Transport Layer Security, TLS), virtual private network (Virtual Private Network, VPN), internet protocol security (Internet Protocol Security, IPsec), and the like. In other embodiments, custom and/or dedicated data communication techniques may also be used in place of or in addition to the data communication techniques described above.

The embodiment of the living body detection system provided in the present specification belongs to the same concept as the living body detection method in one or more embodiments, and an execution subject corresponding to the living body detection method related in one or more embodiments in the present specification may be the service platform 100 described above; the execution subject corresponding to the living body detection method in one or more embodiments of the specification may also be an electronic device corresponding to a client, which is specifically determined based on an actual application environment. The implementation process of the embodiment of the living body detection system may be described in detail in the following method embodiments, which are not described herein.

Based on the schematic view of the scenario shown in fig. 1, the living body detection method provided in one or more embodiments of the present specification is described in detail below.

Referring to fig. 2, a flow diagram of a biopsy method that may be implemented in dependence on a computer program and that may be run on a von neumann system-based biopsy device is provided for one or more embodiments of the present description. The computer program may be integrated in the application or may run as a stand-alone tool class application. The living body detection device may be an electronic apparatus.

Specifically, the living body detection method comprises the following steps:

It will be appreciated that in some authentication scenarios where the detection of the real physiological characteristics of an object is determined, image biopsy requires that it be verified whether the image is a real living object operation based on the acquired target biopsy image in an image biopsy application. The image living body detection needs to effectively resist common living body attack means such as photos, face changing, masks, shielding, screen flipping and the like, so that the detection is helpful for distinguishing fraudulent behaviors and guaranteeing the rights and interests of users;

s102: acquiring first sample object detection images in a target scene, and determining a target scene property description text based on each first sample object detection image;

in the present specification, the image mode type of the first sample object detection image is not limited, and the image mode type may be a fitting of one or more of image mode types such as a video mode type carrying object information, a color picture mode (rgb) type, a small video mode type, an animation mode type, a depth image (depth) mode type, an infrared image (ir) mode type, a near infrared mode (NIR) type, and the like.

Illustratively, in an actual target scene, a certain number of first sample object detection images may be acquired by, for example, an RGB camera, a monocular camera, an infrared camera, etc., based on the corresponding living detection task.

In one or more embodiments of the present disclosure, an innovative artificial intelligence image generation manner is used to generate samples in a high-quality new scene, that is, a target scene, and the generated samples are used to train a living body detection model in the target scene, so that dependence on artificial labeling of the new scene is greatly reduced, and adaptation efficiency is improved.

Specifically, a target scene, namely a new scene which needs to be adapted to a living body detection model, is firstly acquired a small number of new scene images, namely a certain number of first sample object detection images in the target scene, and a better target scene property description text is obtained by utilizing the first sample object detection images, wherein the target scene property description text can be regarded as a text of a scene description word (prompt);

the target scene property description text comprises a scene commonality description text and a scene characteristic description text, and for the same scene, different images in the scene have common description characteristics such as outdoor characteristics, dim light characteristics and the like, but for different images in the same scene, certain characteristics (characteristic characteristics caused by differences of deployment angles and deployment positions) such as special buildings in the scene and the like also exist; upon generating a second sample object detection image of the target scene: extracting accurate new scene description text (namely target scene property description text) from a small number of new scene samples (first sample object detection images);

S104: generating artificial intelligent images based on the target scene property description text and the first sample object detection images to obtain a plurality of second sample object detection images, wherein the number of samples of the second sample object detection images is greater than that of the first sample object detection images;

specifically, after extracting the accurate new scene description text (i.e., the target scene property description text), a large number of scene-like samples (second sample object detection images) can then be generated by combining the target scene property description text with the acquired small number of new scene samples (first sample object detection images).

Specifically, using the artificial intelligence image generation method, that is, (AI Generated Content, AIGC), a plurality of different types of second sample object detection images in a specific target scene are generated, and the second sample object detection images are sample object detection images in the target scene, including attack sample type and living body sample type.

Among them, AIGC generally refers to content generated using artificial intelligence. In the method, image content generated by AI is specified;

s106: and creating an initial target living body detection model aiming at the target scene based on a reference living body detection model, and performing model training on the initial target living body detection model by adopting the second sample object detection image to obtain a target living body detection model, wherein the reference living body detection model is the living body detection model in the reference scene.

Specifically, the initial target biopsy model is created based on a reference biopsy model that is already present or trained in the reference scene. Executing the living body detection method of the specification realizes a cross-scene living body detection;

specifically, the initial target living detection model of the generated sample-second sample object detection image is used for carrying out high-efficiency training to obtain the target living detection model which is adapted under the new scene. Namely, the living body detection model in the reference scene is successfully applied to the new target scene. Therefore, the phenomenon that living body models are detected from the head of a new scene and sample images are few in the new scene can be improved, better performance can be achieved on the new target scene, a large amount of manual labeling and long-time model training time are not needed, and model adaptation efficiency in the new scene is improved.

It should be noted that the initial target living detection model and the reference living detection model according to one or more embodiments of the present disclosure are trained by a machine learning model, and the machine learning model includes, but is not limited to, fitting one or more of a convolutional neural network (Convolutional Neural Network, CNN) model, a deep neural network (Deep Neural Network, DNN) model, a recurrent neural network (Recurrent Neural Networks, RNN), a model, an embedded (emmbedding) model, a gradient lifting decision tree (Gradient Boosting Decision Tree, GBDT) model, a logistic regression (Logistic Regression, LR) model, and the like.

In the actual application stage, the target living body detection model obtained through training can be deployed and corresponding living body detection decision can be made. The electronic equipment can deploy the target living body detection model to at least one client side in a target scene, and when a user of the client side performs living body detection transactions (such as identity authentication, access control and attendance) for example, the user of the client side can acquire facial images when the user performs identity authentication;

in a specific implementation scenario, the target living body detection model is deployed under the target scenario, so that living body detection processing is performed on target object detection images in the target scenario through the target living body detection model.

Referring to fig. 3, a schematic view of a living body detection scenario is provided in an embodiment of the present disclosure. As shown in fig. 3, the illustrated client is an apparatus for performing identity authentication in a target scene, on which an image capturing device is configured, and when a user approaches the client and is within an image capturing range of the image capturing device, the image capturing device captures a target object detection image (such as a face image of the user shown in fig. 3) of a person, and performs in-vivo detection on the captured target object detection image using a target in-vivo detection model.

Optionally, referring to fig. 4, fig. 4 is a schematic flow diagram illustrating generation of a scene property description according to one or more embodiments of the present disclosure. Specific:

s202: inputting each first sample object image into a scene property description generation model, and outputting at least one target scene commonality description information and at least one target scene characteristic description information;

The embodiment interprets how to accurately extract new scene description text, that is, target scene commonality description information and target scene characteristic description information, from the first sample object image, and performs related scene text description extraction by using a scene property description generation model based on commonality and characteristic decoupling modeling.

Wherein the target scene property description text comprises a scene target scene commonality description text and a target scene characteristic description text,

for the same scene, different images in the scene have common description features, such as outdoor features, dim light features and the like, and the target scene common description text is used for representing the common description features;

for different images of the same scene, there may also be certain characteristic (characteristic due to differences in deployment angle and deployment position) features, such as a particular building within the scene, etc., and the target scene characteristic description information is used to characterize these characteristic description features.

Specifically, a scene property description generating model is obtained through creating training, each first sample object image is input into the scene property description generating model, and at least one piece of target scene commonality description information and at least one piece of target scene characteristic description information are output.

It can be appreciated that when a new scene is aimed at, by acquiring first sample object images of the new scene, i.e. the target scene, for example, 5-10 first sample object images of each type of client device for in-vivo detection, N pieces of target scene commonality description information and N pieces of target scene characteristic description information can be obtained in the above manner assuming that N pieces of first sample object images are acquired in total;

s204: and determining a target scene property description text based on the target scene commonality description information and the target scene property description information.

Specifically, description and summarization are carried out on the target scene commonality description information and the target scene characteristic description information, and a target scene property description text is obtained.

In one possible implementation, the target scene commonality description information and the target scene characteristic description information may be directly used as the target scene property description text.

In a possible implementation manner, the determining the target scene property description text based on the target scene commonality description information and the target scene property description information may be:

a2: sampling the description text of the target scene commonality description information to obtain a target scene commonality description text;

In one possible implementation, a data sampling algorithm can be adopted to sample description text of the common description information of all target scenes;

in a possible implementation manner, a text frequency corresponding to each common description text can be determined based on the target scene common description information, at least one reference common description text is determined from the common description texts based on the text frequency, and the reference common description texts are subjected to description text sampling to obtain the target scene common description text.

Optionally, at least one reference commonality description text is determined from the commonality description texts based on the text frequency, and the commonality description text with the text frequency greater than the set frequency threshold (for example, the frequency threshold may be N/2) may be used as the reference commonality description text, that is, the commonality description text with the higher frequency is taken out as the reference commonality description text. The text can be screened, the selected reference commonality description text is more matched with the scene commonality for a new scene, and error interference can be filtered.

A4: sampling the description text of the target scene characteristic description information to obtain a target scene characteristic description text;

in a possible implementation manner, a text frequency corresponding to each characteristic description text can be determined based on the target scene characteristic description information, at least one reference characteristic description text is determined from the characteristic description texts based on the text frequency, and the reference characteristic description texts are subjected to description text sampling to obtain the target scene characteristic description text.

Optionally, at least one reference characteristic description text is determined from the characteristic description texts based on the text frequency, which may be that the characteristic description text with the text frequency greater than the set frequency threshold (for example, the frequency threshold may be N/2) is used as the reference characteristic description text, that is, the characteristic description text with the higher current frequency is taken out as the reference characteristic description text. The text can be screened, the selected reference characteristic description text is more matched with the scene characteristic for a new scene, and error interference can be filtered.

A6: and obtaining a target scene property description text based on the target scene commonality description text and the target scene property description text.

Randomly sampling a plurality of target scene commonality description texts, wherein the sampled target scene characteristic description texts can be used as target scene property description texts used for generating object detection images under new scenes;

in one or more embodiments of the present disclosure, in order to improve the dependency on data collection and manual annotation in cross-scene adaptation, a small amount of sample object images under a new scene may be collected to determine a scene property description text under the new scene, so as to assist in subsequent mass generation of sample images under the new scene, so as to achieve the purpose of detecting a model by using a new scene adaptation living body, and accurately extract the new scene description text on the basis of using only a small amount of sample object images, and subsequently perform mass sample image generation under similar scenes through the description text of the new scene, thereby guaranteeing a subsequent model training effect.

Optionally, referring to fig. 5, fig. 5 is a schematic flow diagram of a scene property description generation model according to one or more embodiments of the present disclosure.

S3002: creating an initial scene property description generation model;

in one or more embodiments of the present disclosure, an initial scene property description generation model may be created based on a machine learning model in response to a living body detection task, model training may be performed to obtain a scene property description generation model, and the scene property description generation model may be used to generate a scene property description text;

Optionally, the initial scene property description generating model may include a basic feature encoding module, a commonality text description generating module and a characteristic text description generating module, and the sample scene property description text includes a sample scene commonality description text and a sample scene characteristic description text;

s3004: acquiring a sample object detection image corresponding to at least one sample scene, wherein the sample object detection image carries a scene property description text label;

the sample scene is based on the environment scene configuration to which the living body detection is applied;

specifically, a data labeling mode can be adopted in advance to label the description of the acquired sample object detection image, so as to obtain a scene property description text label; for example, expert service can be introduced to carry out data annotation, and characteristics and commonalities of the sample object detection image are described to obtain a scene property description text label.

Illustratively, the scene form description text labels include a scene commonality description text label and a scene characteristic description text label.

S3006: performing at least one round of model training on the initial scene property description generation model by adopting the sample object detection image to obtain a sample scene property description text;

Illustratively, the sample object detection image may be input into the initial scene property description generation model for at least one round of model training, as follows:

b2: inputting the sample object detection image into an initial scene property description generation model, and extracting features of the sample object detection image through the basic feature coding module to obtain sample image features;

the input of the basic feature coding module is a sample object detection image, and the basic coding module performs feature coding extraction on the sample object detection image to obtain sample image features.

The sample image features may include a fit of one or more of the feature types of the object image, such as texture features, color features, shape features, and spatial relationship features.

B4: carrying out common description on the sample image features through the common text description generation module to obtain a sample scene common description text;

the input of the common text description generation module is the sample image feature of the last step, and the common text description generation module analyzes the commonality in the scene image based on the sample image feature, so that the sample scene commonality description text is obtained.

B6: and carrying out characteristic description on the sample image characteristics through the characteristic text description generating module to obtain a sample scene characteristic description text.

The input of the characteristic text description generating module is the characteristic of the sample image in the last step, and the characteristic text description generating module analyzes the characteristics in the scene image based on the characteristic of the sample image, so that the sample scene characteristic description text is obtained.

S3008: and adjusting model parameters of the initial scene property description generation model based on the scene property description text labels and the sample scene property description text until the initial scene property description generation model completes model training, so as to obtain a scene property description generation model.

Illustratively, the scene property description text labels include scene commonality description text labels and scene characteristic description text labels;

further, the adjustment process of the model parameter adjustment may be as follows:

c2, determining a common text description generation loss based on the sample scene common description text and the scene common description text label;

and the common text description generation loss is used as a supervision signal generated by a common description in the scene property description generation process, the model training effect is supervised from the common description dimension, and the common text description generation loss can be obtained by adopting a related loss calculation function based on the sample scene common description text and the scene common description text label.

For example, the commonality text description generation penalty may be derived using the following penalty function calculation:

wherein LOSS _common Generating a loss for the common text description, the emb _c-pred Describing text for the sample scene commonality, the emb _c-label Describing text labels for the scene commonalities;

c4, determining a characteristic text description generation loss based on the sample scene characteristic description text and the scene characteristic description text label;

the characteristic text description generation loss is used as a supervision signal of characteristic description generation in the scene property description generation process, the model training effect is from the characteristic description dimension, and the characteristic text description generation loss can be obtained by adopting a related loss calculation function based on a sample scene characteristic description text and the scene characteristic description text label.

For example, the property text description generation penalty may be derived using the following penalty function calculation:

wherein LOSS _specific Generating a penalty for the property text description, the emb _s-pred Describing text for the sample scene characteristics, the emb _s-label Describing text labels for the scene characteristics;

and C6, adjusting model parameters of the initial scene property description generation model based on the commonality text description generation loss and the characteristic text description generation loss.

It can be understood that the model comprehensive loss of the initial scene property description generation model can be obtained based on the commonality text description generation loss and the characteristic description text generation loss, and the model parameter adjustment is performed on the initial scene property description generation model by adopting a model back propagation mode based on the model comprehensive loss until the model finishing training condition is met, so that the trained scene property description generation model can be obtained.

The model end training condition of the model may include, for example, that the value of the loss function is less than or equal to a preset loss function threshold, the iteration number reaches a preset number of times threshold, and so on. The specific model end training conditions may be determined based on actual conditions and are not specifically limited herein.

In one or more embodiments of the present disclosure, in order to improve the dependency on data acquisition and manual annotation in cross-scene adaptation, a scene property description generation model is trained, through which a scene property description text in a new scene can be determined by the scene property description generation model under the condition of acquiring a small number of sample object images in the new scene, thereby assisting in subsequent mass generation of sample images in the new scene, achieving the purpose of a living body detection model of the new scene adaptation, and accurately extracting the new scene description text on the basis of only adopting a small number of sample object images, and subsequently performing mass sample image generation in the same kind of scene through the description text of the new scene, thereby guaranteeing the training effect of the subsequent model.

Optionally, the electronic device performs artificial intelligence image generation on the text described based on the target scene property and the first sample object detection image to obtain a plurality of second sample object detection images, which may be:

the target scene property description text and the first sample object detection images are input into a scene living body sample generation model, and a plurality of second sample object detection images are output.

Referring to fig. 6, fig. 6 is a schematic diagram of model training of a scene living sample generation model according to one or more embodiments of the present disclosure.

S4002: creating an initial scene living body sample generation model;

in one or more embodiments of the present disclosure, an initial scene living sample generation model may be created based on a machine learning model in response to a living detection task, model training may be performed to obtain a scene living sample generation model, and living sample generation under a new scene may be performed using the scene living sample generation model;

schematically, in practical application, an image generating method based on AIGC directly inputs text directly, and generates an image by text control, however, the generated image has large randomness and uncontrollability. Therefore, for such highly normalized images in a living body detection scene, a scheme of generating images directly based on the AIGC cannot meet the demand. In order to improve this phenomenon, a new scene living sample generation method that can be locked based on a material body is introduced in the training phase of the scene living sample generation model. The method confirms the image main body by inputting reference materials (various living faces, attack materials and the like), fine-tunes the main body direction by using texts, simultaneously generates a scene by using the texts of the last step, and then fuses the results to obtain a final living body detection image of a new scene, so that the generated image is more controllable and has better quality.

In a possible implementation manner, the initial scene living body sample generation model may at least include a material locking module, a scene generation module, a fusion module and a feature extraction module;

s4004: acquiring at least one reference material image, and determining a material scene description text and a material main body adjustment text corresponding to the reference material image;

the reference material images divide living body attack types in the acquisition stage, namely, each reference material image is determined to belong to a living body material type or attack material type, and a material scene description text and a material main body adjustment text for the reference material images are configured for each reference material image before model training;

the material main body adjusting text is main body adjusting information of the material main body aiming at the reference material image, such as main body backlight adjustment and the like, the material main body adjusting text can be configured by manual customization, and the material main body adjusting text is introduced to quickly improve the processing capacity and the robustness of the model;

the material scene description text is scene description information aiming at the characteristics and commonalities of the reference material images, and expert service can be introduced to be configured by an expert terminal; the material scene description text can also be a material scene description text generated by using the scene property description, a reference material image is input into the scene property description generation model, and the output of the model is used as the material scene description text;

S4006: and inputting the reference material image, the material main body adjustment text and the material scene description text into the initial scene living sample generation model to perform at least one round of model training, so as to obtain a trained scene living sample generation model.

The initial scene living body sample generation model at least comprises a material locking module, a scene generation module, a fusion module and a feature extraction module;

in the model training process, the module structure of the feature extraction module and model parameters can be controlled to be unchanged, and the model parameters are adjusted for the rest material locking module, the scene generating module and the fusion module.

In one possible embodiment, the initial scene living sample generation model for at least one round of model training may be of the form:

inputting the reference material image and the material scene description text into the initial scene living body sample generation model, carrying out image main body adjustment on the reference material image based on the material main body adjustment text by the material locking module to obtain a material adjustment image, carrying out scene generation processing based on the material scene description text by the scene generation module to obtain a reference scene image, and carrying out image fusion processing based on the material adjustment image and the reference scene image by the fusion module to obtain a sample object detection image;

Each round of input of the material locking module is a reference material image and a material main body adjusting text which are specifically of a living material type or an attack material type, the material locking module is used for adjusting the semantics of the material main body adjusting text, the image main body is determined after the image analysis is carried out on the reference material image, and the adjusting semantics are collected to adjust the image main body in the reference material image, so that the module output is obtained: the material adjustment image of the material main body adjustment text is satisfied;

each round of input of the scene generation module is a material scene description text, the scene generation module carries out AIGC image generation based on the material scene description text, and AI generates a reference scene image meeting the material scene description text; alternatively, the scene generation module may use an AIGC model in the related art;

each round of input of the fusion module is the material adjustment image and the reference scene image, and the fusion module carries out image fusion on the material adjustment image and the reference scene image to generate a new living body detection image serving as a sample object detection image;

d4, determining material adjustment image characteristics of the material adjustment image, reference material image characteristics of the reference material image, material adjustment text characteristics of the material main body adjustment text, material scene description text characteristics of the material scene description text, reference scene image characteristics of the reference scene image and sample object image characteristics of the sample object detection image through the characteristic extraction module;

Illustratively, a model configuration feature extraction module is generated for the initial scene living sample, the feature extraction module being capable of remaining unchanged during a model training phase, the feature extraction module being configured to perform feature extraction to further generate at least one model training supervisory signal, the model training process being supervised based on the model training phase signal.

And D6, adjusting model parameters of an initial scene living sample generation model based on the material adjustment image features, the reference material image features, the material adjustment text features, the material scene description text features, the reference scene image features and the sample object image features until the initial scene living sample generation model is trained, so as to obtain a scene living sample generation model.

Illustratively, based on the material adjustment image features, the reference material image features, the material adjustment text features, the material scene description text features, the reference scene image features and the sample object image features, material main body locking loss, scene constraint loss and image fusion loss can be calculated respectively, so that model comprehensive loss is obtained, and model parameter adjustment is carried out on an initial scene living sample generation model through the model comprehensive loss;

E2: determining a material body locking loss based on the material adjustment image feature and the reference material image feature, the material adjustment image feature and the material adjustment text feature;

the material main body locking loss is set to be used as a consistency monitoring signal in a material locking process, the characteristics of a material adjustment image and the characteristics of a reference material image are monitored to be consistent as much as possible in a model training process, and the characteristics of the reference material image and the characteristics of a material adjustment text are monitored to be consistent as much as possible. The material main body locking loss can be obtained by adopting a related loss calculation function (such as a Euclidean distance loss calculation function, a hinge loss calculation function and the like);

for example, the material body lock loss can be calculated using the following loss calculation formula:

wherein the LOSS _lock Locking the loss for the material body, the f _img1-g Adjusting image characteristics for the material, the f _img-ori For the reference material image characteristics, f _text Adjusting text characteristics for the material;

e4: determining a scene constraint loss based on the reference scene image features and the material adjustment text features;

the scene constraint loss is set as a consistency supervision signal in the scene image generation process, and the consistency between the reference scene image characteristics and the material adjustment text characteristics is supervised in the model training process. In practical application, the reference scene image feature and the material adjustment text feature are usually two high-dimensional feature vectors, the two high-dimensional feature vectors are mapped into a vector space with the same dimension, scene constraint loss can be accurately calculated through a loss calculation function in a vector form, and the scene constraint loss can be obtained through a related loss calculation function (such as a Euclidean distance loss calculation function, a hinge loss calculation function and the like);

For example, the scene constraint loss may be calculated using the following loss calculation formula:

wherein the LOSS _scene For the scene constraint loss, the f _img2-g For the reference scene image features, the f _text Adjusting text characteristics for the material;

e6: determining an image fusion loss based on the sample object image features, the story adjustment text features, and the story scene description text features;

the image fusion loss is set as a consistency monitoring signal in the image fusion process, the image characteristics of the fused sample object, the material adjustment text characteristics and the material scene description text characteristics are monitored to be consistent as much as possible in the model training process, in practical application, the three characteristics are usually three high-dimensional characteristic vectors, the three high-dimensional characteristic vectors are mapped into a vector space with the same dimension, the image fusion loss can be accurately calculated through a vector form loss calculation function, and the image fusion loss can be obtained through a related loss calculation function (such as a Euclidean distance loss calculation function, a hinge loss calculation function and the like);

for example, the image fusion loss can be calculated using the following loss calculation formula:

wherein the LOSS _fusion For the image fusion loss, the f _img3-g For the sample object image features, the f _text1 Adjusting text features for the material, f _text2 Describing text characteristics for the material scene;

e8: and performing model parameter adjustment on an initial scene living body sample generation model based on the material body locking loss, the scene constraint loss and the image fusion loss.

It can be understood that, based on the material main body locking loss, the scene constraint loss and the image fusion loss, a model comprehensive loss of the initial scene living sample generation model can be obtained, and the model parameter adjustment is performed on the initial scene living sample generation model by adopting a model back propagation mode based on the model comprehensive loss until the model finishing training condition is met, and then the trained scene living sample generation model can be obtained.

In one or more embodiments of the present disclosure, for detecting such highly normalized images in living body, the existing AIGC scheme cannot meet the requirement, the requirement of automatically generating images according to descriptive text can be met by training a scene living body sample generation model, the introduced new scene living body sample generation method based on material locking confirms a subject by inputting reference materials (various living body faces, attack materials, etc.), fine-tunes the subject with text, and simultaneously generates scenes with the text of the previous step, and then fuses the results to obtain a final new scene living body detection image. The image thus generated is more controllable and of better quality. After the model training is completed, batch generation of new scene samples can be performed, and finally, training of a living detection model of the new scene is performed based on the generated samples.

Optionally, performing model training on the initial target living body detection model by using the second sample object detection image to obtain a target living body detection model;

furthermore, considering that the quality of the generated samples still has certain difference, the link provides a new scene living detection model training method based on quality self-adaptive selection, and when an initial target living detection model is created, the model structure is configured to introduce a new scene living detection model with quality self-adaptive selection, so that the detection quality and detection effect of living detection can be further improved;

optionally, the initial target biopsy model may include a quality adaptation module, a soft gate module, and a biopsy module based on the reference biopsy model;

in this specification, the model training the initial target living body detection model by using the second sample object detection image to obtain a target living body detection model may be:

referring to fig. 7, fig. 7 is a schematic diagram of a training flow of a target living body detection model;

s5002: inputting the second sample object detection image into the initial target living body detection model, performing sample quality evaluation processing on the second sample object detection image through the quality self-adaptive module to obtain a sample quality score, determining sample training weights for the second sample object detection image by adopting a soft gate module based on the sample quality score, and performing living body detection on the second sample object detection image through the living body detection module to obtain a living body detection result;

The quality self-adapting module can be a network module adopting an acceptance quality model, and the quality self-adapting module is used for carrying out image quality evaluation according to the second sample object detection image input by each round to calculate the sample quality score;

alternatively, the quality adaptation module may be created based on a pre-trained acceptance quality model, where the module architecture and model parameters of the quality adaptation module are controlled to remain unchanged during the model training phase, only for the calculation of the image quality score for each second sample object detection image.

The soft gate module is also called a soft gate module, and performs sample weight evaluation on the second sample object detection image input in each round through the soft gate module to obtain sample training weights, and the living body detection module is instructed to perform model training in a targeted mode based on the sample training weights serving as a weight distribution monitoring signal;

alternatively, the soft gate module may be created based on a pre-trained soft gate network, and the module architecture and model parameters of the quality-controllable soft gate module remain unchanged during the model training phase, only for the calculation of the sample training weights for each second sample object detection image. In the model back propagation training process, the back propagation weight of each sample to the living body detection module of the model is the weight output by the soft gate module, and the soft gate soft module can control the quantity of other neuron output information;

The living detection module may be created based on a reference living detection model that has been trained in a reference scene. The input of the living body detection module is a second sample object detection image of each round, and the living body detection module carries out living body detection based on the second sample object detection image to obtain a living body detection result

S5004: and performing model parameter adjustment on the living body detection module of the initial target living body detection model based on the sample training weight and the living body detection result to obtain a target living body detection model.

Illustratively, the model synthesis loss of the initial target biopsy model may be determined from the weight sparsity loss and the biopsy loss;

specifically, a weight sparsity penalty may be determined based on the sample training weights for each of the second sample object detection images;

the sum of the weights of the second sample object detection images in the training process of each batch of sample images is controlled to be as small as possible through the weight sparse loss, namely only a part of high-quality samples participate in the training of the living body detection module through the weight sparse loss control.

Specifically, a living body detection tag of the second sample object detection image may be acquired, and living body detection loss is determined based on the living body detection result and the living body detection tag;

In particular, model parameter adjustments may be made to the biopsy module of the initial target biopsy model based on the weight sparsity loss and the biopsy loss.

It can be understood that the model comprehensive loss for the initial target living body detection model can be obtained based on the weight sparse loss and the living body detection loss, the model parameter adjustment is performed on the living body detection module of the initial target living body detection model by adopting a model back propagation mode based on the model comprehensive loss until the model finishing training condition is met, and the trained target living body detection model can be obtained.

In one or more embodiments of the present disclosure, the target living detection model trained in the above manner may overcome the quality difference of the generated sample image, and implement a method for training a living detection model in a new scene based on quality adaptive selection, so as to ensure the detection quality of the living detection model.

The living body detection apparatus provided in the present specification will be described in detail with reference to fig. 8. The living body detection apparatus shown in fig. 8 is used to perform the method according to the embodiment shown in fig. 1 to 7 of the present specification, and for convenience of explanation, only the portion relevant to the present specification is shown, and specific technical details are not disclosed, and reference is made to the embodiment shown in fig. 1 to 7 of the present specification.

Referring to fig. 8, a schematic structural diagram of the living body detection apparatus of the present specification is shown. The living body detection apparatus 1 may be implemented as all or a part of the user terminal by software, hardware, or a combination of both. According to some embodiments, the living body detection apparatus 1 comprises a data processing module 11, an image generating module 12 and a living body detection module 13, in particular for:

a data processing module 11, configured to acquire first sample object detection images in a target scene, and determine a target scene property description text based on each of the first sample object detection images;

an image generation module 12, configured to perform artificial intelligence image generation based on the target scene property description text and the first sample object detection image, to obtain a plurality of second sample object detection images, where the number of samples of the second sample object detection images is greater than the number of samples of the first sample object detection image;

And the living body detection module 13 is configured to create an initial target living body detection model for the target scene based on a reference living body detection model, perform model training on the initial target living body detection model by using the second sample object detection image, and obtain a target living body detection model, where the reference living body detection model is a living body detection model under the reference scene.

Optionally, the data processing module is configured to:

inputting each first sample object image into a scene property description generation model, and outputting at least one target scene commonality description information and at least one target scene characteristic description information;

and determining a target scene property description text based on the target scene commonality description information and the target scene property description information.

Optionally, the data processing module is configured to: sampling the description text of the target scene commonality description information to obtain a target scene commonality description text;

sampling the description text of the target scene characteristic description information to obtain a target scene characteristic description text;

and obtaining a target scene property description text based on the target scene commonality description text and the target scene property description text.

Optionally, the data processing module is configured to: determining a text frequency corresponding to each common description text based on the target scene common description information, and determining at least one reference common description text from the common description texts based on the text frequency;

and performing description text sampling on each reference commonality description text to obtain a target scene commonality description text.

Optionally, the data processing module is configured to: creating an initial scene property description generation model;

acquiring a sample object detection image corresponding to at least one sample scene, wherein the sample object detection image carries a scene property description text label;

performing at least one round of model training on the initial scene property description generation model by adopting the sample object detection image to obtain a sample scene property description text;

and adjusting model parameters of the initial scene property description generation model based on the scene property description text labels and the sample scene property description text until the initial scene property description generation model completes model training, so as to obtain a scene property description generation model.

Optionally, the initial scene property description generation model includes a basic feature coding module, a commonality text description generation module and a characteristic text description generation module, and the sample scene property description text includes a sample scene commonality description text and a sample scene characteristic description text;

Optionally, the data processing module is configured to:

inputting the sample object detection image into an initial scene property description generation model, and extracting features of the sample object detection image through the basic feature coding module to obtain sample image features;

carrying out common description on the sample image features through the common text description generation module to obtain a sample scene common description text;

and carrying out characteristic description on the sample image characteristics through the characteristic text description generating module to obtain a sample scene characteristic description text.

Optionally, the scene property description text labels include scene commonality description text labels and scene property description text labels,

the model parameter adjustment of the initial scene property description generation model based on the scene property description text tag and the sample scene property description text comprises:

determining a common text description generation penalty based on the sample scene common description text and the scene common description text label;

determining a feature text description generation penalty based on the sample scene feature description text and the scene feature description text label;

And adjusting model parameters of the initial scene property description generation model based on the commonality text description generation loss and the characteristic text description generation loss.

Optionally, the image generating module is configured to:

inputting the target scene property description text and the first sample object detection image into a scene living body sample generation model, and outputting a plurality of second sample object detection images.

Optionally, the image generating module is configured to: an initial scene living sample generation model is created,

acquiring at least one reference material image, and determining a material scene description text and a material main body adjustment text corresponding to the reference material image;

and inputting the reference material image, the material main body adjustment text and the material scene description text into the initial scene living sample generation model to perform at least one round of model training, so as to obtain a trained scene living sample generation model.

Optionally, the initial scene living body sample generation model includes a material locking module, a scene generation module, a fusion module and a feature extraction module, and the image generation module is used for:

inputting the reference material image and the material scene description text into the initial scene living body sample generation model, performing image main body adjustment on the reference material image based on the material main body adjustment text through the material locking module to obtain a material adjustment image, performing scene generation processing based on the material scene description text through the scene generation module to obtain a reference scene image, and performing image fusion processing based on the material adjustment image and the reference scene image through the fusion module to obtain a sample object detection image;

Determining, by the feature extraction module, a material adjustment image feature of the material adjustment image, a reference material image feature of the reference material image, a material adjustment text feature of the material body adjustment text, a material scene description text feature of the material scene description text, a reference scene image feature of the reference scene image, and a sample object image feature of the sample object detection image;

and adjusting model parameters of an initial scene living sample generation model based on the material adjustment image features, the reference material image features, the material adjustment text features, the material scene description text features, the reference scene image features and the sample object image features until the initial scene living sample generation model is trained, so as to obtain the scene living sample generation model.

Optionally, the image generating module is configured to: determining a material body locking loss based on the material adjustment image feature and the reference material image feature, the material adjustment image feature and the material adjustment text feature;

determining a scene constraint loss based on the reference scene image features and the material adjustment text features;

Determining an image fusion loss based on the sample object image features, the story adjustment text features, and the story scene description text features;

and performing model parameter adjustment on an initial scene living body sample generation model based on the material body locking loss, the scene constraint loss and the image fusion loss.

Optionally, the initial target living detection model includes a quality adaptive module, a soft gate module, and a living detection module based on the reference living detection model, the living detection module being configured to:

inputting the second sample object detection image into the initial target living body detection model, performing sample quality evaluation processing on the second sample object detection image through the quality self-adaptive module to obtain a sample quality score, determining sample training weights for the second sample object detection image by adopting a soft gate module based on the sample quality score, and performing living body detection on the second sample object detection image through the living body detection module to obtain a living body detection result;

and performing model parameter adjustment on the living body detection module of the initial target living body detection model based on the sample training weight and the living body detection result to obtain a target living body detection model.

Optionally, the living body detection module is configured to:

determining a weight sparsity penalty based on the sample training weights for each of the second sample object detection images;

acquiring a living body detection label of the second sample object detection image, and determining living body detection loss based on the living body detection result and the living body detection label;

model parameter adjustment is performed on the living detection module of the initial target living detection model based on the weight sparse loss and the living detection loss.

Optionally, the device 1 is further configured to;

and deploying the target living body detection model to the target scene so as to carry out living body detection processing on the target object detection image in the target scene through the target living body detection model.

It should be noted that, when the living body detection apparatus provided in the foregoing embodiment performs the living body detection method, only the division of the foregoing functional modules is exemplified, and in practical application, the foregoing functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the living body detection device and the living body detection method provided in the foregoing embodiments belong to the same concept, which embody the implementation process in detail with reference to the method embodiment, and are not described herein again.

The foregoing description is provided for the purpose of illustration only and does not represent the advantages or disadvantages of the embodiments.

The present disclosure further provides a computer storage medium, where a plurality of instructions may be stored, where the instructions are adapted to be loaded by a processor and executed by the living body detection method according to the embodiment shown in fig. 1 to 7, and the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 7, which is not repeated herein.

The present disclosure further provides a computer program product, where at least one instruction is stored, where the at least one instruction is loaded by the processor and executed by the processor, where the specific execution process may refer to the specific description of the embodiment shown in fig. 1 to 7, and details are not repeated herein.

Referring to fig. 9, a block diagram of an electronic device according to an exemplary embodiment of the present disclosure is shown. The electronic device in this specification may include one or more of the following: processor 110, memory 120, input device 130, output device 140, and bus 150. The processor 110, the memory 120, the input device 130, and the output device 140 may be connected by a bus 150.

Processor 110 may include one or more processing cores. The processor 110 utilizes various interfaces and lines to connect various portions of the overall electronic device, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120, and invoking data stored in the memory 120. Alternatively, the processor 110 may be implemented in at least one hardware form of digital signal processing (digital signal processing, DSP), field-programmable gate array (field-programmable gate array, FPGA), programmable logic array (programmable logic Array, PLA). The processor 110 may integrate one or a combination of several of a central processor (central processing unit, CPU), an image processor (graphics processing unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 110 and may be implemented solely by a single communication chip.

The memory 120 may include a random access memory (random Access Memory, RAM) or a read-only memory (ROM). Optionally, the memory 120 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 120 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, which may be an Android (Android) system, including an Android system-based deep development system, an IOS system developed by apple corporation, including an IOS system-based deep development system, or other systems, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The storage data area may also store data created by the electronic device in use, such as phonebooks, audiovisual data, chat log data, and the like.

Referring to FIG. 10, the memory 120 may be divided into an operating system space in which the operating system is running and a user space in which native and third party applications are running. In order to ensure that different third party application programs can achieve better operation effects, the operating system allocates corresponding system resources for the different third party application programs. However, the requirements of different application scenarios in the same third party application program on system resources are different, for example, under the local resource loading scenario, the third party application program has higher requirement on the disk reading speed; in the animation rendering scene, the third party application program has higher requirements on the GPU performance. The operating system and the third party application program are mutually independent, and the operating system often cannot timely sense the current application scene of the third party application program, so that the operating system cannot perform targeted system resource adaptation according to the specific application scene of the third party application program.

In order to enable the operating system to distinguish specific application scenes of the third-party application program, data communication between the third-party application program and the operating system needs to be communicated, so that the operating system can acquire current scene information of the third-party application program at any time, and targeted system resource adaptation is performed based on the current scene.

Taking an operating system as an Android system as an example, as shown in fig. 11, a program and data stored in the memory 120 may be stored in the memory 120 with a Linux kernel layer 320, a system runtime library layer 340, an application framework layer 360 and an application layer 380, where the Linux kernel layer 320, the system runtime library layer 340 and the application framework layer 360 belong to an operating system space, and the application layer 380 belongs to a user space. The Linux kernel layer 320 provides the underlying drivers for various hardware of the electronic device, such as display drivers, audio drivers, camera drivers, bluetooth drivers, wi-Fi drivers, power management, and the like. The system runtime layer 340 provides the main feature support for the Android system through some C/c++ libraries. For example, the SQLite library provides support for databases, the OpenGL/ES library provides support for 3D graphics, the Webkit library provides support for browser kernels, and the like. Also provided in the system runtime library layer 340 is a An Zhuoyun runtime library (Android run) which provides mainly some core libraries that can allow developers to write Android applications using the Java language. The application framework layer 360 provides various APIs that may be used in building applications, which developers can also build their own applications by using, for example, campaign management, window management, view management, notification management, content provider, package management, call management, resource management, location management. At least one application program is running in the application layer 380, and these application programs may be native application programs of the operating system, such as a contact program, a short message program, a clock program, a camera application, etc.; and may also be a third party application developed by a third party developer, such as a game-like application, instant messaging program, photo beautification program, etc.

Taking an operating system as an IOS system as an example, the program and data stored in the memory 120 are shown in fig. 12, the IOS system includes: core operating system layer 420 (Core OS layer), core service layer 440 (Core Services layer), media layer 460 (Media layer), and touchable layer 480 (Cocoa Touch Layer). The core operating system layer 420 includes an operating system kernel, drivers, and underlying program frameworks that provide more hardware-like functionality for use by the program frameworks at the core services layer 440. The core services layer 440 provides system services and/or program frameworks required by the application, such as a Foundation (Foundation) framework, an account framework, an advertisement framework, a data storage framework, a network connection framework, a geographic location framework, a sports framework, and the like. The media layer 460 provides an interface for applications related to audiovisual aspects, such as a graphics-image related interface, an audio technology related interface, a video technology related interface, an audio video transmission technology wireless play (AirPlay) interface, and so forth. The touchable layer 480 provides various commonly used interface-related frameworks for application development, with the touchable layer 480 being responsible for user touch interactions on the electronic device. Such as a local notification service, a remote push service, an advertisement framework, a game tool framework, a message User Interface (UI) framework, a User Interface UIKit framework, a map framework, and so forth.

Among the frameworks illustrated in fig. 12, frameworks related to most applications include, but are not limited to: the infrastructure in core services layer 440 and the UIKit framework in touchable layer 480. The infrastructure provides many basic object classes and data types, providing the most basic system services for all applications, independent of the UI. While the class provided by the UIKit framework is a basic UI class library for creating touch-based user interfaces, iOS applications can provide UIs based on the UIKit framework, so it provides the infrastructure for applications to build user interfaces, draw, process and user interaction events, respond to gestures, and so on.

The manner and principle of implementing data communication between the third party application program and the operating system in the IOS system may refer to the Android system, and this description is not repeated here.

The input device 130 is configured to receive input instructions or data, and the input device 130 includes, but is not limited to, a keyboard, a mouse, a camera, a microphone, or a touch device. The output device 140 is used to output instructions or data, and the output device 140 includes, but is not limited to, a display device, a speaker, and the like. In one example, the input device 130 and the output device 140 may be combined, and the input device 130 and the output device 140 are a touch display screen for receiving a touch operation thereon or thereabout by a user using a finger, a touch pen, or any other suitable object, and displaying a user interface of each application program. Touch display screens are typically provided on the front panel of an electronic device. The touch display screen may be designed as a full screen, a curved screen, or a contoured screen. The touch display screen can also be designed to be a combination of a full screen and a curved screen, and a combination of a special-shaped screen and a curved screen is not limited in this specification.

In addition, those skilled in the art will appreciate that the configuration of the electronic device shown in the above-described figures does not constitute a limitation of the electronic device, and the electronic device may include more or less components than illustrated, or may combine certain components, or may have a different arrangement of components. For example, the electronic device further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a wireless fidelity (wireless fidelity, wiFi) module, a power supply, and a bluetooth module, which are not described herein.

In this specification, the execution subject of each step may be the electronic device described above. Optionally, the execution subject of each step is an operating system of the electronic device. The operating system may be an android system, an IOS system, or other operating systems, which is not limited in this specification.

The electronic device of the present specification may further have a display device mounted thereon, and the display device may be various devices capable of realizing a display function, for example: cathode ray tube displays (cathode ray tubedisplay, CR), light-emitting diode displays (light-emitting diode display, LED), electronic ink screens, liquid crystal displays (liquid crystal display, LCD), plasma display panels (plasma display panel, PDP), and the like. A user may utilize a display device on electronic device 101 to view displayed text, images, video, etc. The electronic device may be a smart phone, a tablet computer, a gaming device, an AR (Augmented Reality ) device, an automobile, a data storage device, an audio playing device, a video playing device, a notebook, a desktop computing device, a wearable device such as an electronic watch, electronic glasses, an electronic helmet, an electronic bracelet, an electronic necklace, an electronic article of clothing, etc.

In the electronic device shown in fig. 9, the processor 110 may be configured to call an application program stored in the memory 120, and specifically perform the following operations:

In one embodiment, the processor 110, when executing the determining the target scene property descriptive text based on each of the first sample object detection images, performs the following operations:

In one embodiment, the processor 110, when executing the determining a target scene property descriptive text based on the target scene commonality descriptive information and the target scene property descriptive information, performs the steps of:

sampling the description text of the target scene commonality description information to obtain a target scene commonality description text;

In one embodiment, the processor 110 performs the following steps when performing the descriptive text sampling of the target scene commonality description information:

determining a text frequency corresponding to each common description text based on the target scene common description information, and determining at least one reference common description text from the common description texts based on the text frequency;

In one embodiment, the processor 110, when executing the living body detection method, further performs the steps of:

creating an initial scene property description generation model;

In one embodiment, the initial scene property description generation model includes a basic feature encoding module, a commonality text description generation module, and a characteristic text description generation module, the processor 110 performs the following steps in executing the sample scene property description text,

at least one round of model training is carried out on the initial scene property description generation model by adopting the sample object detection image to obtain a sample scene property description text, and the following steps are executed:

In one embodiment, the scene property description text tag includes a scene commonality description text tag and a scene property description text tag, and the processor 110 performs the following steps when executing the model parameter adjustment for the initial scene property description generation model based on the scene property description text tag and the sample scene property description text:

In one embodiment, the processor 110 performs the following steps when performing artificial intelligence image generation of the text describing the property of the target scene and the first sample object detection image to obtain a plurality of second sample object detection images:

an initial scene living sample generation model is created,

In one embodiment, the initial scene living sample generation model includes a material locking module, a scene generation module, a fusion module, and a feature extraction module, and the processor 110 performs at least one round of model training when performing the inputting the reference material image, the material body adjustment text, and the material scene description text into the initial scene living sample generation model, to obtain a trained scene living sample generation model, and performs the following steps:

In one embodiment, the processor 110 performs the following steps in performing the model parameter adjustment of the initial scene living sample generation model based on the material adjustment image feature, the reference material image feature, the material adjustment text feature, the material scene description text feature, the reference scene image feature, and the sample object image feature:

determining a material body locking loss based on the material adjustment image feature and the reference material image feature, the material adjustment image feature and the material adjustment text feature;

In one embodiment, the initial target living detection model includes a quality adaptive module, a soft gate module, and a living detection module based on the reference living detection model, and the processor 110 performs model training on the initial target living detection model using the second sample object detection image to obtain a target living detection model, and performs the following steps:

In one embodiment, the processor 110 performs the following steps in performing the model parameter adjustment on the biopsy module of the initial target biopsy model based on the sample training weights and the biopsy results:

In one embodiment, the processor 110, when executing the living body detection method, further performs the following steps;

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals according to the embodiments of the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the first sample object detection image, the second sample object detection image, and the like referred to in this specification are all acquired with sufficient authorization.

The foregoing disclosure is only illustrative of the preferred embodiments of the present invention and is not to be construed as limiting the scope of the claims, which follow the meaning of the claims of the present invention.

Claims

1. A method of in vivo detection, the method comprising:

2. The method of claim 1, the determining target scene property descriptive text based on each of the first sample object detection images, comprising:

3. The method of claim 2, the determining a target scene property descriptive text based on the target scene commonality descriptive information and the target scene property descriptive information, comprising:

4. The method of claim 3, the performing descriptive text sampling on the target scene commonality descriptive information, comprising:

5. The method of claim 2, the method further comprising:

creating an initial scene property description generation model;

6. The method of claim 5, wherein the initial scene property description generation model comprises a basic feature encoding module, a commonality text description generation module, and a property text description generation module, wherein the sample scene property description text comprises sample scene commonality description text and sample scene property description text,

at least one round of model training is performed on the initial scene property description generation model by adopting the sample object detection image to obtain a sample scene property description text, and the method comprises the following steps:

7. The method of claim 6, wherein the scene property description text labels include a scene commonality description text label and a scene property description text label,

8. The method of claim 1, wherein the artificial intelligence image generation based on the target scene property descriptive text and the first sample object detection image results in a plurality of second sample object detection images, comprising:

9. The method of claim 8, the method further comprising:

an initial scene living sample generation model is created,

10. The method of claim 9, wherein the initial scene living sample generation model comprises a material locking module, a scene generation module, a fusion module, and a feature extraction module,

inputting the reference material image, the material main body adjustment text and the material scene description text into the initial scene living sample generation model for at least one round of model training to obtain a trained scene living sample generation model, wherein the method comprises the following steps of:

11. The method of claim 10, the model parameter adjustment of the initial scene in-vivo sample generation model based on the story adjustment image features, the reference story image features, the story adjustment text features, the story scene description text features, the reference scene image features, and the sample object image features, comprising:

12. The method of claim 1, wherein the initial target biopsy model comprises a quality adaptation module, a soft gate module, and a biopsy module based on the reference biopsy model,

the training of the model of the initial target living body detection model by using the second sample object detection image to obtain a target living body detection model comprises the following steps:

13. The method of claim 12, the model parameter adjustment of the biopsy module of the initial target biopsy model based on the sample training weights and the biopsy results, comprising:

14. The method of claim 1, the method further comprising;

15. A living body detection apparatus, the apparatus comprising:

and the living body detection module is used for creating an initial target living body detection model aiming at the target scene based on a reference living body detection model, carrying out model training on the initial target living body detection model by adopting the second sample object detection image to obtain a target living body detection model, wherein the reference living body detection model is a living body detection model under the reference scene.

16. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any one of claims 1 to 14.

17. A computer program product storing at least one instruction for loading by a processor and performing the method steps of any one of claims 1 to 14.

18. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1-14.