CN116168452A - Living body detection method, living body detection device and computer storage medium - Google Patents

Living body detection method, living body detection device and computer storage medium Download PDF

Info

Publication number
CN116168452A
CN116168452A CN202211695760.XA CN202211695760A CN116168452A CN 116168452 A CN116168452 A CN 116168452A CN 202211695760 A CN202211695760 A CN 202211695760A CN 116168452 A CN116168452 A CN 116168452A
Authority
CN
China
Prior art keywords
living body
body detection
detection result
voice
target object
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211695760.XA
Other languages
Chinese (zh)
Inventor
武文琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202211695760.XA priority Critical patent/CN116168452A/en
Publication of CN116168452A publication Critical patent/CN116168452A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/57Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for processing of video signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Acoustics & Sound (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present specification provide a living body detection method, apparatus, and computer storage medium, the method may include: and determining weights corresponding to the video living body detection result and the voice living body detection result respectively in the comprehensive decision according to the environment information, and determining whether the target object is a living body or not based on each living body detection result and the corresponding weights. According to the method and the device, the influence of environmental information such as strong light and noise on the living body detection result can be reduced as much as possible by considering the living body detection results of different types and the weights corresponding to the detection results, so that the accuracy of living body detection is effectively improved.

Description

Living body detection method, living body detection device and computer storage medium
Technical Field
Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a living body detection method, a living body detection device, and a computer storage medium.
Background
The living body detection technology is also called as Anti-Face fraud detection (FAS) technology, and is used for judging whether the Face captured by the camera is a real Face or a fake Face, so that common attack means such as photos, videos, face changing, masks, shielding, 3D animation, screen flipping and the like are effectively resisted, and the system is helped to discriminate behaviors such as fraud, false authentication and the like.
Disclosure of Invention
The embodiments of the present specification provide a living body detection method, apparatus, and computer storage medium, which can effectively improve the stability of a downstream system.
In a first aspect, embodiments of the present disclosure provide a method of in vivo detection, the method comprising:
displaying a shooting interface and acquiring environment information of a target object;
shooting the target object based on the shooting interface to obtain video information and voice information of the target object;
determining a video living body detection result based on the video information of the target object, and determining a voice living body detection result based on the voice information of the target object;
determining a weight corresponding to the video living body detection result and a weight corresponding to the voice living body detection result based on the environment information;
and determining a living body detection result of the target object based on the weights corresponding to the video living body detection result and the weights corresponding to the voice living body detection result and the voice living body detection result.
In a second aspect, the present specification provides a living body detection apparatus, the apparatus comprising:
the acquisition module is used for displaying a shooting interface and acquiring environment information of a target object;
The obtaining module is used for photographing the target object based on the photographing interface to obtain video information and voice information of the target object;
the first determining module is used for determining a video living body detection result based on the video information of the target object and determining a voice living body detection result based on the voice information of the target object;
the second determining module is used for determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information;
and a third determining module, configured to determine a living body detection result of the target object based on the video living body detection result and a weight corresponding to the video living body detection result, and the weight corresponding to the voice living body detection result and the voice living body detection result.
In a third aspect, the illustrative embodiments provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.
In a fourth aspect, embodiments of the present disclosure provide an electronic device, which may include: a processor and a memory;
wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
In a fifth aspect, the present description provides a program product comprising instructions which, when run on a computer, cause the computer to perform the above-described method steps.
The technical scheme provided by some embodiments of the present specification has the following beneficial effects:
according to the method and the device, the corresponding weights of the video living body detection result and the voice living body detection result can be determined according to the environment information when the comprehensive decision is made, and whether the target object is a living body or not is determined based on the living body detection results and the corresponding weights. According to the method and the device, the influence of environmental information such as strong light and noise on the living body detection result can be reduced as much as possible by considering the living body detection results of different types and the weights corresponding to the detection results, so that the accuracy of living body detection is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present description, the drawings that are required in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present description, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a system architecture diagram of a living body detection method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a living body detection method according to an embodiment of the present disclosure;
FIG. 3a is a schematic flow chart of another living body detection method according to an embodiment of the present disclosure;
FIG. 3b is a first application scenario diagram of another in-vivo detection method provided by embodiments of the present disclosure;
fig. 3c is a second application scenario diagram of another living body detection method according to an embodiment of the present disclosure;
FIG. 4a is a schematic flow chart of another living body detection method according to an embodiment of the present disclosure;
FIG. 4b is a view of another embodiment of a living body detection method according to the present disclosure;
fig. 5 is a schematic structural view of a living body detection apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.
In the description of the present specification, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meaning of the terms in this specification will be understood by those of ordinary skill in the art in the light of the specific circumstances. In addition, in the description of the present specification, unless otherwise indicated, "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
With the widespread use of facial recognition technology, security has become an increasing focus. The living body detection technology among the face recognition technologies is a basis for improving the face recognition security.
Specifically, the in-vivo detection technique may include: static in vivo detection and dynamic in vivo detection. Static live detection refers to determining whether a still picture is a real user picture or a secondary flip, which is generally used in scenes with low protection against attacks, for example, scenes that require uploading of a real user avatar. Dynamic living body detection refers to a process of determining whether a target object is a real user by indicating the user to perform a specified action, and generally comparing the real user with an identity card image to confirm the user identity and other scenes. At present, the application of personnel and certification is quite wide, and the application is from real-name authentication of an Internet system to identity verification of banks, securities, social security participants and the like.
With the increasing number of living body attack modes, some high-level dynamic living body attack modes, such as injecting attack videos by bypassing a camera, are presented to cheat the face and body system. In particular, various interactive dynamic living detection methods such as blinking, turning around, lip language, etc. can be employed to break such an attack. The method is characterized in that variables are introduced in the dynamic living body attack detection process, so that the success rate of fixation video attack prepared in advance is reduced. However, the interaction time of blinking and turning is longer, and the interaction mode of lip language has higher requirements on the mouth action of people. Thus, the above-described approaches all tend to degrade the user experience.
Fig. 1 shows a system architecture diagram of a living body detection method applied to an embodiment of the present specification. As shown in fig. 1, the execution subject of the embodiment of the present specification is a communication terminal including, but not limited to: handheld devices, personal computers, tablet computers, vehicle-mounted devices, smart phones, computing devices, or other processing devices connected to a wireless modem, etc. Terminal devices in different networks may be called different names, for example: a user equipment, an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent or user equipment, a cellular telephone, a cordless telephone, a personal digital assistant (personal digital assistant, PDA), a terminal device in a fifth generation mobile communication technology (5th generation mobile networks,5G) network or a future evolution network, and the like. The terminal system refers to an operating system capable of running on the terminal, is a program for managing and controlling terminal hardware and terminal applications, and is an indispensable system application for the terminal. The system comprises an Android system, an IOS system, a Windows Phone (WP) system, a Ubuntu mobile version operating system and the like, which are not limited to the Android system.
According to some embodiments, the communication terminal may be connected to the server through a network. The network is used to provide a communication link between the communication terminal and the server. The network may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. It should be understood that the number of terminals, networks and servers in fig. 1 is merely illustrative. There may be any number of terminals, networks and servers as practical. For example, the server may be a server cluster formed by a plurality of servers.
Next, a system architecture diagram described in conjunction with fig. 1 will be described for the living body detection method provided in the embodiment of the present specification. In one embodiment, shown in FIG. 2, a flowchart of a method of in-vivo detection is provided. As shown in fig. 2, the living body detection method may include the steps of:
s201, displaying a shooting interface and acquiring environment information of a target object.
The shooting interface represents an interface displayed by the communication terminal to shoot the target object. The environment information indicates information in the current environment where the target object is located, such as light intensity, noise intensity, and the like, acquired by the communication terminal.
Possibly, the light intensity is obtained by a light sensor on the communication terminal. The noise intensity is collected by a noise collection means (e.g. a microphone) on the communication terminal. Here, noise refers to sound in the environment other than sound emitted from the target object.
S202, shooting the target object based on a shooting interface to obtain video information and voice information of the target object.
The video information of the target object in the embodiment of the present disclosure represents a video of the target object captured by the communication terminal in a preset time. The voice information of the target object indicates the voice of the target object recorded by the communication terminal in a preset time.
Possibly, the video information of the target object in the embodiment of the present specification includes a plurality of face image frames in succession.
Possibly, the speech information of the target object in the embodiment of the present specification includes a continuous multi-frame speech word segmentation signal.
It can be understood that, in the embodiment of the present disclosure, the video information and the voice information of the target object may be obtained automatically by the communication terminal, or may be obtained by clicking a shooting key in a shooting interface displayed by the communication terminal.
S203, determining a video living body detection result based on the video information of the target object, and determining a voice living body detection result based on the voice information of the target object.
Wherein the video living detection result indicates a probability of determining whether the target object is a living body based on the video information, and the voice living detection result indicates a probability of determining whether the target object is a living body based on the voice information. For example, the probability that the target object is a living body may be determined to be 70% based on the video information of the target object, and the probability that the target object is a living body may be determined to be 80% based on the voice information of the target object.
It is understood that the embodiment of the present specification can determine whether the target object is a living body by the information such as the facial feature of the target object in the video information, and determine whether the target object is a living body by the information such as the voice feature of the target object in the voice information.
S204, determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information.
Possibly, the environmental information in the embodiment of the present specification may include: ambient light intensity.
Specifically, the embodiment of the present specification may determine the weight corresponding to the video living body detection result based on the first mapping relation. And determining the weight corresponding to the voice living body detection result based on the weight corresponding to the video living body detection result. The first mapping relation is used for representing the relation between the ambient light intensity and the weight corresponding to the video living body detection result.
It may be understood that when the intensity of the ambient light is within the preset range, the weight corresponding to the video living body detection result may be a fixed value, for example, when the intensity of the ambient light is within the preset range, it may be determined that the weight corresponding to the video living body detection result is 0.5 according to the first mapping relationship, and then the weight corresponding to the voice living body detection result is also 1-0.5=0.5. When the ambient light intensity is smaller than the minimum value in the preset range or larger than the maximum value in the preset range, the weight corresponding to the video living detection result is reduced along with the reduction or increase of the ambient light intensity, for example, the preset range of the ambient light intensity is 40-60, if the ambient light intensity is 30, the weight corresponding to the video living detection result can be determined to be 0.35 according to the first mapping relation, and then the weight corresponding to the voice living detection result is 1-0.35=0.65. If the intensity of the ambient light is 20, it can be determined that the weight corresponding to the video living body detection result is 0.23 according to the first mapping relation, and then the weight corresponding to the voice living body detection result is 1-0.23=0.77.
Possibly, the environmental information in the embodiment of the present specification may include: noise intensity.
Specifically, the embodiment of the present specification may determine the weight corresponding to the voice living body detection result based on the second mapping relation; and determining the weight corresponding to the video living body detection result based on the weight corresponding to the voice living body detection result. The second mapping relationship is used for representing the relationship between the noise intensity and the weight corresponding to the voice living body detection result.
It is understood that the weight corresponding to the voice living detection result may be a fixed value when the noise intensity is within the preset range, and inversely proportional to the weight corresponding to the voice living detection result when the noise intensity exceeds the preset range, that is, the greater the noise intensity, the smaller the weight corresponding to the voice living detection result. For example, when the noise intensity is within the preset range of 0 to 60, the weight corresponding to the voice living detection result is determined to be 0.5 according to the second mapping relation, and then the weight corresponding to the video living detection result is 1 to 0.5=0.5. If the noise intensity is 70, and the weight corresponding to the voice living body detection result is determined to be 0.4 according to the second mapping relation, the weight corresponding to the video audio living body detection result is 1-0.4=0.6. If the intensity of the ambient light is 20, and the weight corresponding to the voice frequency living body detection result is determined to be 0.26 according to the second mapping relation, the weight corresponding to the video frequency living body detection result is 1-0.26=0.74.
Possibly, the environmental information in the embodiment of the present specification may include: ambient light intensity and noise intensity.
Specifically, the embodiment of the present specification may determine a weight corresponding to the video living body detection result based on the third mapping relation; and determining the weight corresponding to the voice living body detection result based on the fourth mapping relation. The third mapping relation is used for representing the relation between the ambient light intensity and the weight corresponding to the video living body detection result; the fourth mapping relationship is used to represent the relationship between the noise intensity and the weight corresponding to the voice living body detection result.
It is understood that, similarly, when the intensity of the ambient light is within the preset range, the weight corresponding to the video living body detection result may be a fixed value. When the ambient light intensity is smaller than the minimum value in the preset range or larger than the maximum value in the preset range, the weight corresponding to the video living body detection result is reduced along with the reduction or increase of the ambient light intensity. For example, if the preset range of the ambient light intensity is 40-60, and if the ambient light intensity is 20, the weight corresponding to the video living body detection result can be determined to be 0.23 according to the third mapping relation. The weight corresponding to the voice living detection result may be a fixed value when the noise intensity is within a preset range, and inversely proportional to the weight corresponding to the voice living detection result when the noise intensity exceeds the preset range. For example, when the noise intensity is within the preset range of 0 to 60, the weight corresponding to the voice living detection result determined according to the fourth mapping relationship may be 0.5. If the noise intensity is 70, the weight corresponding to the voice living body detection result is determined to be 0.4 according to the fourth mapping relation.
It can be understood that when the intensity of the ambient light and the intensity of the noise in the embodiment of the present disclosure both exceed the respective preset ranges, the weight corresponding to the video living detection result and the weight corresponding to the voice living detection result both decrease, that is, the sum of the weight corresponding to the video living detection result and the voice living detection result is less than 1.
S205, determining a living body detection result of the target object based on the weights corresponding to the video living body detection result and the video living body detection result, and the weights corresponding to the voice living body detection result and the voice living body detection result.
It can be appreciated that, in the embodiment of the present specification, the weights corresponding to the video living body detection result and the weights corresponding to the voice living body detection result and the voice living body detection result may be comprehensively determined to obtain the living body detection result of the target object. Wherein the living body detection result of the target object is that the target object is a living body or a non-living body.
Specifically, the living body detection result of the target object in the embodiment of the present specification may be determined based on the weights corresponding to the video living body detection result and the video living body detection result, the numerical value obtained by making the integrated decision of the weights corresponding to the voice living body detection result and the voice living body detection result, and the living body result threshold value.
For example, the target object can be determined to be a living body because the video living body detection result is 70%, the weight corresponding to the video living body detection result is 0.5, the voice living body detection result is 80%, the weight corresponding to the voice living body detection result is 0.5+80% ×0.5=75% which is the value obtained by making a comprehensive decision with the weight corresponding to the voice living body detection result being 0.5, the living body result threshold is 60%, and 75% > 60%.
For another example, the video living body detection result is 70%, the weight corresponding to the video living body detection result is 0.3, the voice living body detection result is 80%, the weight corresponding to the voice living body detection result is 0.2, the numerical value obtained by the comprehensive decision is 70% ×0.23+80% ×0.4=48.1%, the living body result threshold is 60%, and 48.1% <60%, so that the target object can be determined to be a non-living body.
Therefore, the specification can determine the weights corresponding to the video living body detection result and the voice living body detection result respectively in the comprehensive decision according to the environment information, and then determine whether the target object is a living body or not based on each living body detection result and the corresponding weights. According to the method and the device, the influence of environmental information such as strong light and noise on the living body detection result can be reduced as much as possible by considering the living body detection results of different types and the weights corresponding to the detection results, so that the accuracy of living body detection is improved. In addition, the weight corresponding to the corresponding detection result can be dynamically adjusted according to the current environment information, so that the information processing time and the detection time are reduced, and the user experience is improved.
In one embodiment, shown in FIG. 3a, a flowchart of a method of in-vivo detection is provided. As shown in fig. 3, the living body detection method includes the steps of:
s301, displaying a shooting interface and acquiring environment information of a target object.
The shooting interface in the embodiment of the present disclosure includes random glare information and random text information.
Specifically, the random glare information in the embodiment of the present disclosure indicates that a photographing interface of a communication terminal emits light according to a random glare sequence of a preset number of frames, so as to irradiate light of multiple different colors onto a target object. The random text information represents a text randomly generated by the communication terminal, which may be alphanumeric, or a combination of alphanumeric, etc. For example, the random text displayed on the photographing interface of the communication terminal shown in fig. 3b is a number 3124.
S302, shooting the target object based on random colorful information in a shooting interface to obtain colorful video information and voice information of the target object.
In particular, the glare that the embodiments of the present description emit during shooting may be varied. Wherein the variation of the glare may include one or more of the following random variations: the color of various lights in the dazzle flash, the position of the lights with various colors and the flash frequency of the lights with various colors. For example, the shooting process may include 5 kinds of light of red, yellow, blue, green, and orange, and the order of appearance of the 5 kinds of light is different during each shooting process, and the flicker frequency of the 5 kinds of light is also different during each shooting process.
For example, the target object a is photographed based on random glare information in the photographing interface, wherein the random glare information sequentially includes red light flashing 1S, yellow light flashing 0.5S, orange light flashing 0.7S, green light flashing 0.4S, and blue light flashing 1S. Shooting the target object B based on random dazzling information in a shooting interface, wherein the random dazzling information sequentially comprises blue light flickering 0.5S, yellow light flickering 0.7S, red light flickering 0.9S, green light flickering 1S and orange light flickering 0.5S.
It will be appreciated that the target object needs to read out text information in the photographing interface during photographing to verify whether the target object is a living body.
S303, determining a dazzle living body detection result based on the dazzle video information of the target object.
The colorful living body detection result in the embodiment of the specification is a video living body detection result.
It is understood that the embodiment of the present specification can make a living judgment of a human subject by face reflection. Because the detection principle of the colorful living body is that objects with different materials can selectively absorb the light rays with different wavelengths. Thus, when changing between different colors of light, the color difference generated by striking the face (diffuse reflection) and striking the screen or paper is different.
Possibly, the embodiment of the specification can extract the face features of the colorful video of the target object to obtain the face colorful video of the target object; and then sampling the face dazzling video of the target object based on a preset time interval, so that a face dazzling image set comprising a plurality of face dazzling image frames can be obtained. Further, based on color conversion frame difference information corresponding to a plurality of face colorful image frames and random colorful information in the face colorful image set, a colorful living body detection result can be determined.
Specifically, in the embodiment of the present specification, firstly, face alignment operation needs to be performed on a multi-frame image sequence of a target object and a reference face image; determining color conversion frame difference information between adjacent frame image sequences of a target face and color conversion frame difference information between adjacent frames of a reference face image corresponding to random colorful information after face alignment operation; and comparing the two to each other one by one to obtain a colorful living body detection result.
S304, determining a voice living body detection result based on the consistency of the voice information of the target object and the random text information.
It can be appreciated that the embodiment of the present specification may determine the voice live detection result according to the consistency, i.e. the similarity degree, of the collected voice information of the target object and the random text information.
Possibly, the embodiment of the specification may segment and divide the collected voice information of the target object to obtain a plurality of voice segments. Further, the voice in-vivo detection result can be obtained by comparing the voice word with the voice word corresponding to the random text information one by one.
S305, determining a living body detection result of the target object based on the dazzling living body detection result, the weight corresponding to the dazzling living body detection result and the weight corresponding to the voice living body detection result and the voice living body detection result.
For example, if the glare living body detection result is 40%, the weight corresponding to the glare living body detection result is 0.5, the voice living body detection result is 70%, and the weight corresponding to the voice living body detection result is 0.5, then the living body detection result of the target object is 40% by 0.5+70% by 0.5=55%, and the living body result threshold is 60%, and since 55% <60%, it can be determined that the target object is a non-living body.
Possibly, the embodiment of the present disclosure may further extract a mouth feature in the video information of the target object to obtain a mouth feature video of the target object; determining lip live detection results based on the mouth feature video of the target object and the voice information of the target object; determining a weight corresponding to the colorful living body detection result, a weight corresponding to the voice living body detection result and a weight corresponding to the lip language living body detection result based on the environment information; and determining the living body detection result of the target object based on the dazzling living body detection result, the weight corresponding to the voice living body detection result and the voice living body detection result, and the weight corresponding to the lip language living body detection result and the lip language living body detection result.
It can be understood that, because the colorful living body detection is to perform multi-color lighting on the face in a manner of the explosion screen, the interaction information is extracted to judge whether the living body attack is living body attack. However, when the switching process is performed rapidly by using the high-frequency light with different colors, some old mobile phones may have the situation of frame blocking and frame dropping, so that high uncertainty may be brought to the subsequent processing stage. Therefore, the embodiment of the specification adds the lip language living body detection method, and performs interactive information extraction by making the target object read the random text information so as to reduce the influence of colorful living body detection on the final living body detection result.
It will be appreciated that since the process of lip live detection may be affected by both light intensity and noise intensity, the weight corresponding to the lip live detection result is also determined based on the environmental information.
Possibly, when the embodiment of the present specification needs to make a comprehensive decision on the colorful living body detection result, the voice living body detection result, and the lip living body detection result, weights corresponding to the colorful living body detection result, the voice living body detection result, and the lip living body detection result need to be recalculated, for example, weights corresponding to the colorful living body detection result, the voice living body detection result, and the lip living body detection result are 1/3 respectively.
Specifically, in the embodiment of the present disclosure, a mouth feature video of a target object may be sampled based on a target time interval to obtain a mouth image set of the target object; word segmentation processing is carried out on the voice information of the target object, so that a voice word segmentation set of the target object is obtained; aligning a plurality of mouth image frames in a mouth image set and a plurality of voice word segmentation fragments in a voice word segmentation set to obtain lip language information of a target object; and comparing the lip language information of the target object with the lip language characteristic information corresponding to the random text to obtain a lip language living body detection result. Wherein the mouth image set may comprise a plurality of mouth image frames; the speech word set may include a plurality of speech word segments.
It will be appreciated that the embodiments of the present disclosure require sampling of the mouth feature video of the target object to obtain the frame-by-frame mouth feature of the target object when reading random text information. Meanwhile, word segmentation processing is required to be performed on the extracted voice signals, so that each voice segment corresponds to a text pronunciation in a shooting interface, and then a deep convolutional neural network is utilized to perform feature extraction and other processing to obtain a voice classification result corresponding to each word segmentation in the voice information of the target object. Further, the alignment processing is carried out on the mouth image frame and the voice dimension frame to remove redundant frames, so that the calculation amount of subsequent processing is reduced. And finally, comparing the mouth image frame and the voice dimension frame which are aligned with each other, and obtaining the lip language living body detection result.
Referring to fig. 3c, in a specific example, the communication terminal obtains a video signal and a voice signal through a shooting interface, where the video signal is a colorful video information, and the voice signal is a voice information collected by a microphone. Specifically, the face video information in the colorful video information needs to be extracted, and it can be understood that if enough key point positioning information in the face video information cannot be obtained, the face needs to be corrected to extract the time sequence colorful face atlas and the time sequence mouth atlas respectively. For the collected voice signals, word segmentation processing is needed to be performed firstly to obtain a plurality of voice word segments, and further, alignment processing is needed to be performed on the plurality of voice word segments and image frames in the time sequence mouth chart set to obtain an aligned time sequence mouth chart set. The time sequence colorful face atlas, the aligned time sequence mouth atlas and the plurality of voice segmentation words are respectively input into corresponding deep convolution neural networks (the number of internal layers of each neural network is the same, namely the output data structures are the same), colorful living body detection results, lip living body detection results and voice living body detection results are respectively obtained, further, fusion decision is carried out based on weights corresponding to the living body detection results, living body attack classification can be obtained, namely the living body detection results of target objects are attack (non-living body) or living body.
Thus, the embodiments of the present disclosure provide a living body detection method for random text and dazzling video, which can collect video and voice signals respectively. On one hand, the input video and voice signals are extracted from the video signals, and the face colorful image set and the mouth image set are respectively classified by the colorful and lip language to output corresponding living body attack detection results. On the other hand, the corresponding voice result is recognized from the voice signal after word segmentation. And finally, fusing the multiple living body detection results in a fusion decision mode to obtain a final living body detection result. By means of the multi-mode and multi-random variable fusion mode, different scenes can be better adapted, the influence of environmental factors on detection is reduced, and therefore detection accuracy is improved.
In one embodiment, shown in FIG. 4a, a flowchart of a method of in-vivo detection is provided. As shown in fig. 4, the living body detection method includes the steps of:
s401, displaying a shooting interface and acquiring environment information of a target object.
The shooting interface in the embodiment of the present disclosure may further include eye indication information.
Possibly, the eye-indicating information in the embodiment of the present specification may be random, for example, when the photographing interface of the communication terminal photographs the target object a, the displayed eye-indicating information is left, down, up in order; when shooting the target object B, the displayed eye instruction information is upward, rightward, upward in order.
It is possible that the eye-indicating information in the embodiments of the present disclosure may be fixed, that is, the eye-indicating information displayed by the communication terminal is the same every time.
For example, each time the communication terminal displays an initial photographing interface 1 shown in fig. 4b, the displayed eye indication information is a photographing interface 2 (please see up) and a photographing interface 3 (please see right) in order.
S402, shooting the target object based on random colorful information in a shooting interface to obtain colorful video information and voice information of the target object.
Specifically, S402 corresponds to S302, and will not be described here.
S403, extracting eye features in the dazzling video information of the target object to obtain an eye feature video of the target object.
It will be appreciated that the purpose of extracting ocular features in the embodiments of the present disclosure is to obtain pupil movement information of a target subject.
S404, determining an eye living body detection result based on consistency of the eye characteristic video of the target object and eye indication information in a shooting interface.
Specifically, in the embodiment of the present specification, it is necessary to compare the eye feature video of the target object, that is, pupil movement information, with the eye indication information in the photographing interface to determine the eye living body detection result.
For example, the eye indication information in the shooting interface is left, down and up in turn, and the movement sequence of the pupils in the eye feature video of the target object a is right, down and up in turn, and then the eye living body detection result is 67% of the eye living body probability.
S405, determining a dazzle living body detection result based on the dazzle video information of the target object.
Specifically, S405 is identical to S303, and will not be described here again.
S406, determining a voice living body detection result based on the consistency of the voice information of the target object and the random text information.
Specifically, S406 corresponds to S304, and will not be described here.
S407, determining a weight corresponding to the glare living body detection result, a weight corresponding to the voice living body detection result, a weight corresponding to the lip living body detection result, and a weight corresponding to the eye living body detection result based on the environmental information.
It can be appreciated that, since the eye feature video of the target object is obtained based on the glaring video information, the extracted eye feature video of the target object is also affected by the light intensity. Therefore, the weight corresponding to the eye living detection result is also determined based on the environmental information.
Possibly, since the eye feature video of the target object is obtained based on the colorful video information, the weight corresponding to the eye living detection result is the same as the weight corresponding to the colorful living detection result.
Further, when the fusion decision is made in the embodiment of the present specification, the weight corresponding to the multiple living body detection results of each item needs to be recalculated.
Possibly, since the lip live detection is affected by both the light intensity and the noise intensity, the eye feature video of the target object is obtained based on the dazzling video information, and thus, when both the light intensity and the noise intensity are in the corresponding preset ranges, the weights of each item may be 25%.
Possibly, when the light intensity and the noise intensity are both in the corresponding preset ranges, the weight x corresponding to the colorful living body detection result is obtained 1 Weight x corresponding to eye living body detection result 2 Weight x corresponding to voice living body detection result 3 Weight x corresponding to lip language living body detection result 4 The following formula is satisfied:
Figure BDA0004023340630000131
it can be understood that each weight may satisfy the above formula, for example, weight 1/5 corresponding to the colorful living body detection result, weight 1/5 corresponding to the eye living body detection result, weight 1/3 corresponding to the voice living body detection result, and weight 4/15 corresponding to the lip living body detection result are respectively.
Specifically, when the light intensity and the noise intensity are not in the respective preset ranges, according to the respective mapping relationships, the weight corresponding to the colorful living body detection result, the weight corresponding to the eye living body detection result, the weight corresponding to the voice living body detection result, and the weight corresponding to the lip living body detection result may be reduced.
S408, determining a living body detection result of the target object based on the colorful living body detection result and the weight corresponding to the colorful living body detection result, the weight corresponding to the voice living body detection result and the voice living body detection result, the weight corresponding to the lip language living body detection result and the lip language living body detection result, and the weight corresponding to the eye living body detection result and the eye living body detection result.
In a specific example, 80% of the detection result of the colorful living body, 1/5 of the weight corresponding to the detection result of the colorful living body, 90% of the detection result of the voice living body, 1/3 of the weight corresponding to the detection result of the voice living body, 30% of the detection result of the lip living body, 4/15 of the weight corresponding to the detection result of the lip living body, 70% of the detection result of the eye living body, 1/5 of the detection result of the eye living body, and 80% ×1/5+90% ×1/3+30% ×4/15+70% ×1/5=68% of the detection result of the living body of the target object are determined. In the case where the living body result threshold is 60%, since 68% >60%, the target object can be determined to be a living body.
From this, this description embodiment provides and has increased eye live body detection on the basis of random text and dazzling the live body detection mode of various video, based on the various video information that dazzles that the shooting obtained, just can accomplish the extraction to target object eye characteristic information, need not the secondary discernment, has not only reduced the detection time like this and has promoted user experience, has still improved the degree of accuracy that the living body detected.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Fig. 5 is a schematic structural view of the living body detection apparatus provided in the embodiment of the present specification. The device is used for executing the living body detection method of any embodiment described in the specification. As shown in fig. 5, the living body detection apparatus may include:
an acquisition module 51, configured to display a shooting interface and acquire environmental information of a target object;
an obtaining module 52, configured to capture a video and a voice of the target object based on the capturing interface;
a first determining module 53, configured to determine a video living body detection result based on the video information of the target object, and determine a voice living body detection result based on the voice information of the target object;
A second determining module 54, configured to determine a weight corresponding to the video living body detection result and a weight corresponding to the voice living body detection result based on the environmental information;
and a third determining module 55, configured to determine a living body detection result of the target object based on the video living body detection result and a weight corresponding to the video living body detection result, and the weight corresponding to the voice living body detection result and the voice living body detection result.
In some embodiments, the environmental information includes: ambient light intensity;
the determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information comprises the following steps:
the first weight determining unit is used for determining the weight corresponding to the video living body detection result based on a first mapping relation; the first mapping relation is used for representing the relation between the ambient light intensity and the weight corresponding to the video living body detection result;
and the second weight determining unit is used for determining the weight corresponding to the voice living body detection result based on the weight corresponding to the video living body detection result.
In some embodiments, the environmental information includes: noise intensity;
The determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information comprises the following steps:
a third weight determining unit, configured to determine a weight corresponding to the voice living body detection result based on a second mapping relationship; the second mapping relation is used for representing the relation between the noise intensity and the weight corresponding to the voice living body detection result;
and the fourth weight determining unit is used for determining the weight corresponding to the video living body detection result based on the weight corresponding to the voice living body detection result.
In some embodiments, the environmental information includes: ambient light intensity and noise intensity;
the second determining module includes:
a fifth weight determining unit, configured to determine a weight corresponding to the video living body detection result based on the third mapping relationship; the third mapping relationship is used for representing a relationship between the ambient light intensity and the weight corresponding to the video living body detection result;
a sixth weight determining unit, configured to determine a weight corresponding to the voice living body detection result based on the fourth mapping relationship; the fourth mapping relationship is used for representing a relationship between the noise intensity and the weight corresponding to the voice living body detection result.
In some embodiments, the capture interface contains random glare information and random text information;
the obtaining module is specifically configured to: shooting the target object based on random colorful information in the shooting interface to obtain colorful video information and voice information of the target object;
the first determining module includes:
a dazzle living body detection result determining unit, configured to determine a dazzle living body detection result based on the dazzle video information of the target object; the dazzle living body detection result is the video living body detection result;
and the voice living detection result determining unit is used for determining a voice living detection result based on the consistency of the voice information of the target object and the random text information.
In some embodiments, after the obtaining module and before the third determining module, the apparatus further includes:
the mouth feature video obtaining module is used for extracting mouth features in the video information of the target object to obtain a mouth feature video of the target object;
the lip living body detection result determining module is used for determining lip living body detection results based on the mouth feature video of the target object and the voice information of the target object;
The second determining module is specifically configured to: determining a weight corresponding to the colorful living body detection result, a weight corresponding to the voice living body detection result and a weight corresponding to the lip language living body detection result based on the environment information;
the third determining module is specifically configured to: and determining a living body detection result of the target object based on the colorful living body detection result, the weight corresponding to the voice living body detection result and the voice living body detection result, and the weight corresponding to the lip language living body detection result and the lip language living body detection result.
In some embodiments, the lip-in-living detection result determining module includes:
the mouth image set obtaining unit is used for sampling the mouth characteristic video of the target object based on the target time interval to obtain a mouth image set of the target object; wherein the mouth image set comprises a plurality of mouth image frames;
the voice word segmentation set obtaining unit is used for carrying out word segmentation processing on the voice information of the target object to obtain a voice word segmentation set of the target object; wherein the speech word segmentation set comprises a plurality of speech word segmentation fragments;
The lip information obtaining unit is used for carrying out alignment processing on a plurality of mouth image frames in the mouth image set and a plurality of voice word segmentation fragments in the voice word segmentation set to obtain lip information of the target object;
and the lip living body detection result obtaining unit is used for comparing the lip information of the target object with the lip characteristic information corresponding to the random text to obtain a lip living body detection result.
In some embodiments, the capture interface further comprises eye-indicating information;
in some embodiments, after the obtaining module and before the third determining module, the apparatus further includes:
the eye feature video obtaining module is used for extracting eye features in the colorful video information of the target object to obtain eye feature video of the target object;
the eye living body detection result determining module is used for determining an eye living body detection result based on consistency of the eye feature video of the target object and eye indication information in the shooting interface;
the second determining module is specifically configured to: determining a weight corresponding to the colorful living body detection result, a weight corresponding to the voice living body detection result, a weight corresponding to the lip living body detection result and a weight corresponding to the eye living body detection result based on the environment information;
The third determining module is specifically configured to: and determining a living body detection result of the target object based on the colorful living body detection result and the weight corresponding to the colorful living body detection result, the voice living body detection result and the weight corresponding to the voice living body detection result, the lip language living body detection result and the weight corresponding to the lip language living body detection result, and the weight corresponding to the eye living body detection result and the eye living body detection result.
It should be noted that, when the living body detection apparatus provided in the foregoing embodiment performs the living body detection method, only the division of the foregoing functional modules is exemplified, and in practical application, the foregoing functional distribution may be performed by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the living body detection device and the living body detection method provided in the foregoing embodiments belong to the same concept, which embody the implementation process in detail with reference to the method embodiment, and are not described herein again.
The foregoing description of one or more embodiments is provided for the purpose of illustration only and does not represent a benefit or disadvantage of the embodiments.
Referring to fig. 6, a schematic structural diagram of an electronic device is provided for one or more embodiments of the present disclosure. As shown in fig. 6, the electronic device 60 may include: at least one processor 61, at least one network interface 64, a user interface 63, a memory 65, at least one communication bus 62.
Wherein the communication bus 62 is used to enable connected communication between these components.
The user interface 63 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 63 may further include a standard wired interface and a standard wireless interface.
The network interface 64 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Wherein processor 61 may comprise one or more processing cores. The processor 61 utilizes various interfaces and lines to connect various portions of the overall electronic device 60, perform various functions of the electronic device 60 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 65, and invoking data stored in the memory 65. Alternatively, the processor 61 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 61 may integrate one or a combination of several of a processor (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 61 and may be implemented by a single chip.
The Memory 65 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 65 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 65 may be used to store instructions, programs, code, a set of codes, or a set of instructions. The memory 65 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like referred to in the above respective method embodiments. The memory 65 may also optionally be at least one storage device located remotely from the aforementioned processor 61. As shown in fig. 6, an operating system, a network communication module, a user interface module, and a living body detection application program may be included in the memory 65 as one type of computer storage medium.
In the electronic device 60 shown in fig. 6, the user interface 63 is mainly used for providing an input interface for a user, and acquiring data input by the user; and the processor 61 may be configured to invoke the living detection application program stored in the memory 65 and specifically perform the following operations:
Displaying a shooting interface and acquiring environment information of a target object;
shooting the target object based on the shooting interface to obtain video information and voice information of the target object;
determining a video living body detection result based on the video information of the target object, and determining a voice living body detection result based on the voice information of the target object;
determining a weight corresponding to the video living body detection result and a weight corresponding to the voice living body detection result based on the environment information;
and determining a living body detection result of the target object based on the weights corresponding to the video living body detection result and the weights corresponding to the voice living body detection result and the voice living body detection result.
In some embodiments, the environmental information includes: ambient light intensity;
the processor 61, when executing the determining, based on the environmental information, the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result, specifically executes:
determining a weight corresponding to the video living body detection result based on a first mapping relation; the first mapping relation is used for representing the relation between the ambient light intensity and the weight corresponding to the video living body detection result;
And determining the weight corresponding to the voice living body detection result based on the weight corresponding to the video living body detection result.
In some embodiments, the environmental information includes: noise intensity;
the processor 61, when executing the determining, based on the environmental information, the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result, specifically executes:
determining the weight corresponding to the voice living body detection result based on a second mapping relation; the second mapping relation is used for representing the relation between the noise intensity and the weight corresponding to the voice living body detection result;
and determining the weight corresponding to the video living body detection result based on the weight corresponding to the voice living body detection result.
In some embodiments, the environmental information includes: ambient light intensity and noise intensity;
the processor 61, when executing the determining, based on the environmental information, the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result, specifically executes:
determining a weight corresponding to the video living body detection result based on the third mapping relation; the third mapping relationship is used for representing a relationship between the ambient light intensity and the weight corresponding to the video living body detection result;
Determining a weight corresponding to the voice living body detection result based on the fourth mapping relation; the fourth mapping relationship is used for representing a relationship between the noise intensity and the weight corresponding to the voice living body detection result.
In some embodiments, the capture interface contains random glare information and random text information;
the processor 61, when executing the shooting of the target object based on the shooting interface to obtain video information and voice information of the target object, specifically executes:
shooting the target object based on random colorful information in the shooting interface to obtain colorful video information and voice information of the target object;
the processor 61, when executing the video living body detection result determined based on the video information of the target object, specifically executes:
determining a dazzle living body detection result based on the dazzle video information of the target object; the dazzle living body detection result is the video living body detection result;
the processor 61, when executing the determination of the voice living body detection result based on the voice information of the target object, specifically executes:
and determining a voice living body detection result based on the consistency of the voice information of the target object and the random text information.
In some embodiments, after executing the obtaining the video information and the voice information of the target object, the processor 61 further executes, before determining the living body detection result of the target object:
extracting the mouth characteristics in the video information of the target object to obtain a mouth characteristic video of the target object;
determining lip language living body detection results based on the mouth feature video of the target object and the voice information of the target object;
the processor 61, when executing the determining, based on the environmental information, the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result, specifically executes:
determining a weight corresponding to the colorful living body detection result, a weight corresponding to the voice living body detection result and a weight corresponding to the lip language living body detection result based on the environment information;
the processor 61, when executing the determination of the living body detection result of the target object based on the weights corresponding to the video living body detection result and the weights corresponding to the voice living body detection result and the voice living body detection result, specifically executes:
And determining a living body detection result of the target object based on the colorful living body detection result, the weight corresponding to the voice living body detection result and the voice living body detection result, and the weight corresponding to the lip language living body detection result and the lip language living body detection result.
In some embodiments, when executing the determining the lip live detection result based on the mouth feature video of the target object and the voice information of the target object, the processor 61 specifically executes:
sampling a mouth feature video of the target object based on a target time interval to obtain a mouth image set of the target object; wherein the mouth image set comprises a plurality of mouth image frames;
performing word segmentation processing on the voice information of the target object to obtain a voice word segmentation set of the target object; wherein the speech word segmentation set comprises a plurality of speech word segmentation fragments;
aligning a plurality of mouth image frames in the mouth image set and a plurality of voice word segmentation fragments in the voice word segmentation set to obtain lip information of the target object;
And comparing the lip language information of the target object with the lip language characteristic information corresponding to the random text to obtain a lip language living body detection result.
In some embodiments, the capture interface further comprises eye-indicating information;
after executing the obtaining the video information and the voice information of the target object, the processor 61 further executes:
extracting eye features in the colorful video information of the target object to obtain an eye feature video of the target object;
determining an eye living body detection result based on consistency of the eye feature video of the target object and eye indication information in the shooting interface;
the processor 61, when executing the determining, based on the environmental information, the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result, specifically executes:
determining a weight corresponding to the colorful living body detection result, a weight corresponding to the voice living body detection result, a weight corresponding to the lip living body detection result and a weight corresponding to the eye living body detection result based on the environment information;
The processor 61, when executing the determination of the living body detection result of the target object based on the weights corresponding to the video living body detection result and the weights corresponding to the voice living body detection result and the voice living body detection result, specifically executes:
and determining a living body detection result of the target object based on the colorful living body detection result and the weight corresponding to the colorful living body detection result, the voice living body detection result and the weight corresponding to the voice living body detection result, the lip language living body detection result and the weight corresponding to the lip language living body detection result, and the weight corresponding to the eye living body detection result and the eye living body detection result.
One or more embodiments of the present description also provide a computer-readable storage medium having instructions stored therein, which when executed on a computer or processor, cause the computer or processor to perform one or more of the steps of the embodiments shown in fig. 2, 3a, and 4a described above. The respective constituent modules of the above living body detecting apparatus may be stored in the computer-readable storage medium if implemented in the form of software functional units and sold or used as independent products.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with one or more embodiments of the present description, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (Digital Subscriber Line, DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a digital versatile Disk (Digital Versatile Disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
Those skilled in the art will appreciate that implementing all or part of the above-described embodiment methods may be accomplished by way of a computer program, which may be stored in a computer-readable storage medium, instructing relevant hardware, and which, when executed, may comprise the embodiment methods as described above. And the aforementioned storage medium includes: a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk or an optical disk, or the like. The technical features in the present examples and embodiments may be arbitrarily combined without conflict.
The above-described embodiments are merely preferred embodiments of the present disclosure, and do not limit the scope of the disclosure, and various modifications and improvements made by those skilled in the art to the technical solutions of the disclosure should fall within the protection scope defined by the claims of the disclosure without departing from the design spirit of the disclosure.

Claims (12)

1. A method of in vivo detection, the method comprising:
displaying a shooting interface and acquiring environment information of a target object;
Shooting the target object based on the shooting interface to obtain video information and voice information of the target object;
determining a video living body detection result based on the video information of the target object, and determining a voice living body detection result based on the voice information of the target object;
determining a weight corresponding to the video living body detection result and a weight corresponding to the voice living body detection result based on the environment information;
and determining a living body detection result of the target object based on the weights corresponding to the video living body detection result and the weights corresponding to the voice living body detection result and the voice living body detection result.
2. The method of claim 1, wherein the environmental information comprises: ambient light intensity;
the determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information comprises the following steps:
determining a weight corresponding to the video living body detection result based on a first mapping relation; the first mapping relation is used for representing the relation between the ambient light intensity and the weight corresponding to the video living body detection result;
And determining the weight corresponding to the voice living body detection result based on the weight corresponding to the video living body detection result.
3. The method of claim 1, wherein the environmental information comprises: noise intensity;
the determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information comprises the following steps:
determining the weight corresponding to the voice living body detection result based on a second mapping relation; the second mapping relation is used for representing the relation between the noise intensity and the weight corresponding to the voice living body detection result;
and determining the weight corresponding to the video living body detection result based on the weight corresponding to the voice living body detection result.
4. The method of claim 1, wherein the environmental information comprises: ambient light intensity and noise intensity;
the determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information comprises the following steps:
determining a weight corresponding to the video living body detection result based on the third mapping relation; the third mapping relationship is used for representing a relationship between the ambient light intensity and the weight corresponding to the video living body detection result;
Determining a weight corresponding to the voice living body detection result based on the fourth mapping relation; the fourth mapping relationship is used for representing a relationship between the noise intensity and the weight corresponding to the voice living body detection result.
5. The method of claim 1, the photographic interface comprising random glare information and random text information;
the shooting the target object based on the shooting interface to obtain video information and voice information of the target object comprises the following steps:
shooting the target object based on random colorful information in the shooting interface to obtain colorful video information and voice information of the target object;
the determining a video living body detection result based on the video information of the target object comprises the following steps:
determining a dazzle living body detection result based on the dazzle video information of the target object; the dazzle living body detection result is the video living body detection result;
the determining a voice living body detection result based on the voice information of the target object comprises the following steps:
and determining a voice living body detection result based on the consistency of the voice information of the target object and the random text information.
6. The method of claim 5, after the obtaining the video information and the voice information of the target object, before the determining the living body detection result of the target object, the method further comprising:
extracting the mouth characteristics in the video information of the target object to obtain a mouth characteristic video of the target object;
determining lip language living body detection results based on the mouth feature video of the target object and the voice information of the target object;
the determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information comprises the following steps:
determining a weight corresponding to the colorful living body detection result, a weight corresponding to the voice living body detection result and a weight corresponding to the lip language living body detection result based on the environment information;
the determining the living body detection result of the target object based on the video living body detection result and the weight corresponding to the voice living body detection result and the voice living body detection result comprises the following steps:
and determining a living body detection result of the target object based on the colorful living body detection result, the weight corresponding to the voice living body detection result and the voice living body detection result, and the weight corresponding to the lip language living body detection result and the lip language living body detection result.
7. The method of claim 6, the determining a lip live detection result based on the mouth feature video of the target object and the voice information of the target object, comprising:
sampling a mouth feature video of the target object based on a target time interval to obtain a mouth image set of the target object; wherein the mouth image set comprises a plurality of mouth image frames;
performing word segmentation processing on the voice information of the target object to obtain a voice word segmentation set of the target object; wherein the speech word segmentation set comprises a plurality of speech word segmentation fragments;
aligning a plurality of mouth image frames in the mouth image set and a plurality of voice word segmentation fragments in the voice word segmentation set to obtain lip information of the target object;
and comparing the lip language information of the target object with the lip language characteristic information corresponding to the random text to obtain a lip language living body detection result.
8. The method of claim 5, the capture interface further comprising eye-indicating information;
after obtaining the video information and the voice information of the target object and before determining the living body detection result of the target object, the method further comprises:
Extracting eye features in the colorful video information of the target object to obtain an eye feature video of the target object;
determining an eye living body detection result based on consistency of the eye feature video of the target object and eye indication information in the shooting interface;
the determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information comprises the following steps:
determining a weight corresponding to the colorful living body detection result, a weight corresponding to the voice living body detection result, a weight corresponding to the lip living body detection result and a weight corresponding to the eye living body detection result based on the environment information;
the determining the living body detection result of the target object based on the video living body detection result and the weight corresponding to the voice living body detection result and the voice living body detection result comprises the following steps:
and determining a living body detection result of the target object based on the colorful living body detection result and the weight corresponding to the colorful living body detection result, the voice living body detection result and the weight corresponding to the voice living body detection result, the lip language living body detection result and the weight corresponding to the lip language living body detection result, and the weight corresponding to the eye living body detection result and the eye living body detection result.
9. A living body detection apparatus, the apparatus comprising:
the acquisition module is used for displaying a shooting interface and acquiring environment information of a target object;
the obtaining module is used for photographing the target object based on the photographing interface to obtain video information and voice information of the target object;
the first determining module is used for determining a video living body detection result based on the video information of the target object and determining a voice living body detection result based on the voice information of the target object;
the second determining module is used for determining the weight corresponding to the video living body detection result and the weight corresponding to the voice living body detection result based on the environment information;
and a third determining module, configured to determine a living body detection result of the target object based on the video living body detection result and a weight corresponding to the video living body detection result, and the weight corresponding to the voice living body detection result and the voice living body detection result.
10. A computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the method steps of any of claims 1-8.
11. An electronic device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by a processor and to perform the method steps of any of claims 1-8.
12. A program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1 to 8.
CN202211695760.XA 2022-12-28 2022-12-28 Living body detection method, living body detection device and computer storage medium Pending CN116168452A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211695760.XA CN116168452A (en) 2022-12-28 2022-12-28 Living body detection method, living body detection device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211695760.XA CN116168452A (en) 2022-12-28 2022-12-28 Living body detection method, living body detection device and computer storage medium

Publications (1)

Publication Number Publication Date
CN116168452A true CN116168452A (en) 2023-05-26

Family

ID=86415678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211695760.XA Pending CN116168452A (en) 2022-12-28 2022-12-28 Living body detection method, living body detection device and computer storage medium

Country Status (1)

Country Link
CN (1) CN116168452A (en)

Similar Documents

Publication Publication Date Title
CN107886032B (en) Terminal device, smart phone, authentication method and system based on face recognition
US10275672B2 (en) Method and apparatus for authenticating liveness face, and computer program product thereof
TWI701605B (en) Living body detection method, device and computer readable storage medium
US11321575B2 (en) Method, apparatus and system for liveness detection, electronic device, and storage medium
CN111553864B (en) Image restoration method and device, electronic equipment and storage medium
CN107358146A (en) Method for processing video frequency, device and storage medium
US10956553B2 (en) Method of unlocking an electronic device, unlocking device and system and storage medium
CN107871345A (en) Information processing method and related product
US11250251B2 (en) Method for identifying potential associates of at least one target person, and an identification device
CN109635625A (en) Smart identity checking method, equipment, storage medium and device
CN109145878B (en) Image extraction method and device
CN111259757A (en) Image-based living body identification method, device and equipment
CN111582381B (en) Method and device for determining performance parameters, electronic equipment and storage medium
CN112732553A (en) Image testing method and device, electronic equipment and storage medium
CN116168452A (en) Living body detection method, living body detection device and computer storage medium
CN114841340B (en) Identification method and device for depth counterfeiting algorithm, electronic equipment and storage medium
CN113808117B (en) Lamp detection method, device, equipment and storage medium
CN113176827B (en) AR interaction method and system based on expressions, electronic device and storage medium
CN115546722A (en) Smoke and fire detection method and system, computer equipment and storage medium
WO2014031839A1 (en) System and method for sharing media
CN107295320A (en) The control method and device of projection terminal
US11087121B2 (en) High accuracy and volume facial recognition on mobile platforms
Yasser et al. Egyart_classify: an approach to classify outpainted Egyptian monuments images using GAN and ResNet
CN113259734A (en) Intelligent broadcasting guide method, device, terminal and storage medium for interactive scene
CN114596638A (en) Face living body detection method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination