WO2024051380A1 - Living body detection method and apparatus, electronic device, and storage medium - Google Patents

Living body detection method and apparatus, electronic device, and storage medium Download PDF

Info

Publication number
WO2024051380A1
WO2024051380A1 PCT/CN2023/109776 CN2023109776W WO2024051380A1 WO 2024051380 A1 WO2024051380 A1 WO 2024051380A1 CN 2023109776 W CN2023109776 W CN 2023109776W WO 2024051380 A1 WO2024051380 A1 WO 2024051380A1
Authority
WO
WIPO (PCT)
Prior art keywords
detected
sound signal
sound source
living body
sound
Prior art date
Application number
PCT/CN2023/109776
Other languages
French (fr)
Chinese (zh)
Inventor
黄石磊
刘轶
程刚
廖晨
蒋志燕
Original Assignee
深圳市北科瑞声科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市北科瑞声科技股份有限公司 filed Critical 深圳市北科瑞声科技股份有限公司
Publication of WO2024051380A1 publication Critical patent/WO2024051380A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present disclosure relates to the field of biometric identification technology, specifically to living body detection methods, devices, electronic equipment and storage media.
  • Liveness detection is a method to determine the true physiological characteristics of objects in some identity verification scenarios.
  • liveness detection can use facial key point positioning and face tracking through combined actions such as blinking, opening mouth, shaking head, nodding, etc.
  • Physical sign detection technology which verifies whether the user is a real living person operating technology to resist common attack methods such as photos, face swaps, masks, occlusions, and screen remakes.
  • the present disclosure provides living body detection methods, devices, electronic equipment and storage media.
  • the present disclosure provides a living body detection method, which method includes:
  • the sound source position and the lip position are compared for consistency, and the liveness detection result of the object to be detected is determined based on the comparison results.
  • the sound source position and the lip position are compared for consistency, including:
  • determining the vitality detection result of the subject to be detected based on the comparison results includes:
  • the living body detection result of the object to be detected is determined to be alive.
  • the method when the comparison result indicates that the sound source position and the lip position are consistent, the method further includes:
  • the step of determining that the living body detection result of the object to be detected is a living body is executed.
  • the method before determining the sound signal to be detected, the method further includes:
  • Output interactive instructions which are used to instruct the object to be detected to emit a sound signal corresponding to the preset text data
  • the step of determining the position of the sound source corresponding to the sound signal is performed.
  • the method further includes:
  • the generation process of interactive instructions includes:
  • determining the sound signal to be detected includes:
  • the sound signals collected by each microphone are synthesized and processed to obtain the sound signal to be detected.
  • determining the sound source position corresponding to the sound signal to be detected includes:
  • the present disclosure provides a living body detection device, which device includes:
  • the first determination module is configured to determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the subject to be detected;
  • a second determination module configured to determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
  • the third determination module is configured to compare the sound source position and the lip position for consistency, and determine the living body detection result of the object to be detected based on the comparison result.
  • the third determination module is configured as:
  • the third determination module is configured as:
  • the comparison result indicates that the sound source position and the lip position are inconsistent, it is determined that the living body detection result of the object to be detected is non-living body
  • the living body detection result of the object to be detected is determined to be alive.
  • the device further includes:
  • the input module is configured to input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model to obtain the output result of the mouth shape recognition model when the comparison result indicates that the sound source position and the lip position are consistent;
  • the first execution module is configured to execute the step of determining that the life detection result of the object to be detected is a living body if the output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected.
  • the device further includes:
  • the output module is configured to output an interactive instruction before determining the sound signal to be detected, and the interactive instruction is used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
  • a recognition module configured to perform speech recognition on the sound signal to be detected and obtain text data corresponding to the sound signal to be detected before determining the sound source position corresponding to the sound signal to be detected;
  • a comparison module configured to compare the text data corresponding to the sound signal to be detected with the preset text data for consistency
  • the second execution module is configured to execute the step of determining the sound source position corresponding to the sound signal if the comparison result indicates that the text data corresponding to the sound signal to be detected is consistent with the preset text data.
  • the device further includes:
  • the third execution module is configured to return to the step of outputting interactive instructions if the comparison result indicates that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data.
  • the output module is configured as:
  • the first determining module is configured as:
  • the sound signals collected by each microphone are synthesized and processed to obtain the sound signal to be detected.
  • the second determination module is configured as:
  • intersection position of multiple sound source directions is determined as the sound source position of the sound signal to be detected.
  • the present disclosure provides an electronic device, including: a processor and a memory.
  • the processor is configured to execute a life detection program stored in the memory to implement the life detection method described in the present disclosure.
  • the present disclosure provides a storage medium.
  • the storage medium stores one or more programs.
  • the one or more programs can be executed by one or more processors to implement the living body detection method described in the present disclosure.
  • the sound source position of the sound signal to be detected and the lip position of the object to be detected can be directly located, and when it is determined that the sound source position and the lip position are consistent, it is stated that the sound source position to be detected is consistent.
  • the detection sound signal is emitted from the lips of the object to be detected, and the object to be detected is a living body; otherwise, the object to be detected is a non-living body. It is realized that even if a non-authenticated user impersonates an authenticated user by obtaining the authenticated user's video image, the object to be detected can still be identified as a non-living body, which improves the accuracy and reliability of the living body detection results.
  • Figure 1 is a schematic diagram of an application scenario involved in an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of an application scenario involved in an embodiment of the present disclosure
  • Figure 3 is a flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • Figure 4 is a flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • Figure 5 is a distribution diagram of microphones on a living body detection device provided by an embodiment of the present disclosure
  • Figure 6 shows the life detection device provided by an embodiment of the present disclosure and determines through a microphone Schematic diagram of sound source location
  • Figure 7 is a schematic diagram of the sound source position provided by an embodiment of the present disclosure.
  • Figure 8 is a schematic diagram of the sound source position provided by an embodiment of the present disclosure.
  • Figure 9 is a flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • Figure 10 is a flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • Figure 11 is a block diagram of a living body detection device provided by an embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • Figure 1 is a schematic diagram of an application scenario involved in an embodiment of the present disclosure.
  • the application scenario shown in Figure 1 includes: user 11 and life detection device 13.
  • the living body detection device 13 can be installed with a living body detection system, and supports various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, etc.
  • the above-mentioned display screen can be used to display the video signal captured by the camera and prompt the position of the user's face to be detected.
  • a display screen is taken as an example to illustrate the living body detection device 13 .
  • user 11 can perform life detection in a normal way.
  • the normal method here means that the user 11 stands in front of the camera of the life detection device 13. Then, the life detection device 13 can directly collect the video image containing the face of the user 11 through the camera, and then perform the life test on the user 11 based on the video image. detection.
  • FIG. 2 is a schematic diagram of an application scenario involved in an embodiment of the present disclosure.
  • the application scenario includes: user 11, user 12, living body detection device 13, terminal 14, and terminal 15. Among them, terminal 14 and terminal 15 can perform network communication.
  • Terminal 14 and terminal 15 may be hardware devices or software that support network connections to provide various network services.
  • the terminals 14 and 15 may support various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, etc. In Figure 2, only smart phones are used For example.
  • the terminal 14 and the terminal 15 are software, they can be installed in the electronic devices listed above. In this embodiment, the terminal 14 and the terminal 15 establish a video call by respectively installing corresponding applications.
  • the user 11 can face the display screen of the terminal 14 towards the camera of the life detection device 13, and then the life detection device 13 can receive the video image of the user 12 through the camera. Since the life detection device 13 can detect the vital signs from the video image of the user 12, it passes the life detection. It can be seen that in the existing technology, a non-authenticated user can impersonate the authenticated user by obtaining the authenticated user's video image and pass the liveness detection.
  • an embodiment of the present disclosure provides a life detection method to prevent non-authenticated users from impersonating authenticated users by obtaining video images of authenticated users and improve the accuracy of life detection results.
  • FIG 3 is a flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • the process shown in Figure 3 can be applied to a living body detection device, such as the living body detection device 13 shown in Figure 1 .
  • the process may include the following steps:
  • Step 301 Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the subject to be detected;
  • Step 302 Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
  • Step 303 Compare the sound source position and the lip position for consistency, and determine the living body detection result of the object to be detected based on the comparison result.
  • the above-mentioned sound signal to be detected is a sound signal received by the microphone when the life detection device performs life detection.
  • the above-mentioned image to be detected is an image collected by the camera when the life detection device performs life detection, and contains the face of the subject to be detected.
  • the number of images to be detected can be one or multiple. When there are multiple images to be detected, the multiple images to be detected may refer to multiple images in a video collected by the life detection device through a camera.
  • both the sound signal to be detected and the image to be detected are obtained by the life detection device when performing life detection, it can be said that the sound signal to be detected and the image to be detected correspond to each other.
  • the object to be detected is a real object.
  • the user 11 is the object to be detected.
  • the life detection device 13 can directly collect an image containing the face of the user 11 through the camera to obtain the image to be detected.
  • the user 11 can directly send out a sound signal, and the living body detection device receives the sound signal through the microphone. In this way, the living body detection device Determine the sound signal to be detected.
  • the object to be detected is a virtual object.
  • the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected.
  • the life detection device 13 can collect the image including the face of the user 12 through the camera.
  • the user 12 can send out a sound signal, which is collected by the terminal 15 and sent to the terminal 14.
  • the terminal 14 can play the sound signal through the speaker.
  • the life detection device 13 can receive the sound signal to be detected through the microphone.
  • the user 11 can send out a sound signal, and the life detection device 13 can receive the sound signal to be detected through a microphone.
  • the above-mentioned life detection device may be provided with multiple microphones, and the multiple microphones are usually provided at different locations.
  • the life detection device is provided with four microphones, and the four microphones are provided at the life detection device. on the four corners.
  • the life detection device determines the sound signal to be detected, the sound signals collected by the multiple microphones can be obtained to obtain multiple sound signals.
  • the living body detection device can synthesize and process the sound signals collected by each microphone, and determine the synthesized and processed sound signals as the sound signals to be detected. In this way, noise can be eliminated and a clearer and more accurate sound signal can be obtained.
  • the living body detection device can acquire the sound signal from any of the above microphones, and use the above acquired sound signal as the sound signal to be detected.
  • step 302 The following is a unified description of step 302 and step 303:
  • the living body detection device can locate the sound source position corresponding to the sound signal to be detected. How to locate the sound source will be explained below through the process shown in Figure 4, which will not be described in detail here;
  • the above-mentioned image to be detected contains the face of the subject to be detected. Based on this, the life detection device can determine the lip position of the subject to be detected by performing facial recognition on the image to be detected;
  • the sound source position and the lip position can be compared for consistency, and the living body detection result of the object to be detected can be determined based on the comparison result. Specifically, when the comparison result indicates that the sound source position and the lip position are inconsistent, it indicates that the sound signal to be detected is not emitted from the lips of the subject to be detected, indicating that the living body detection result of the subject to be detected is non-living.
  • the object to be detected is a virtual object.
  • the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected.
  • the living body The detection device 13 can collect images including the video image of the user 12 through a camera.
  • the user 12 can send out a sound signal, which is collected by the terminal 15 and sent to the terminal 14.
  • the terminal 14 can play the sound signal through the speaker, and the terminal
  • the speaker 14 may be located above, below, left, or right of the terminal 14 housing. That is, the sound source position of the sound signal may be located above, below, left, or right of the terminal 14 housing. As shown in Figure 7, the sound source position may be located in any of the areas A, B, C, D, or E, which is inconsistent with the lip position of the object to be detected.
  • the user 11 can send out a sound signal, and then the sound source position at this time is at the position of the user 11 and is inconsistent with the lip position of the object to be detected. At this time, it can be determined that the living body detection result of the object to be detected is non-living body.
  • the comparison result indicates that the sound source position and the lip position are consistent, it indicates that the sound signal to be detected is emitted from the lips of the subject to be detected, indicating that the living body detection result of the subject to be detected is a living body.
  • the object to be detected is a real object.
  • the user 11 is the object to be detected.
  • the life detection device 13 can directly collect an image containing the face of the user 11 through the camera to obtain the image to be detected.
  • the user 11 can directly send out a sound signal, and the life detection device receives the sound signal through the microphone. In this way, the life detection device determines the sound signal to be detected.
  • the sound source position as shown in Figure 8 can be obtained.
  • the sound source position of the sound signal to be detected is located in the F area. It is known that The F area is the lip area of the subject to be detected, which is consistent with the position of the lips of the subject to be detected. At this time, it can be determined that the living body detection result of the subject to be detected is a living body.
  • the image to be detected is an image containing the face of the object to be detected, and then, the sound source position of the sound signal to be detected is determined. , and determine the lip position of the object to be detected based on the image to be detected, compare the sound source position and the lip position for consistency, and determine the living detection result of the object to be detected based on the comparison results.
  • the sound source position of the sound signal to be detected and the lip position of the object to be detected can be directly located, and when it is determined that the sound source position and the lip position are consistent, it means that the sound signal to be detected is generated by the lips of the object to be detected.
  • the object to be detected is a living body; otherwise, the object to be detected is a non-living body. It is realized that even if a non-authenticated user impersonates an authenticated user by obtaining the authenticated user's video image, the object to be detected can still be identified as a non-living body, which improves the accuracy and reliability of the living body detection results.
  • FIG 4 is a flow chart of a living body detection method provided by an embodiment of the present disclosure.
  • the process shown in Figure 4 is based on the process shown in Figure 3 above, describing how the living body detection device locates the sound source position of the sound signal to be detected. As shown in Figure 4, the process may include the following steps:
  • Step 401 Decompose the sound signal to be detected to obtain multiple decomposed signals
  • Step 402 Determine the sound source direction of each decomposed signal
  • Step 403 Determine the intersection position of multiple sound source directions as the sound source position of the sound signal to be detected.
  • the above-described decomposed signal may be a sound signal in any direction among the sound signals in different directions contained in the sound signal to be detected.
  • the life detection device may use microphone array signal processing technology to determine the sound source position of the sound signal to be detected.
  • N microphones may be provided on the above-mentioned living body detection device.
  • N may be greater than or equal to 3.
  • FIG. 5 it is a distribution diagram of microphones on a life detection device provided by an embodiment of the present disclosure. As can be seen from Figure 5, the detection equipment is equipped with four microphones, one microphone is provided at each corner, namely microphone 1, microphone 2, microphone 3, and microphone 4.
  • step 301 when multiple microphones are provided on the life detection device, the sound signals collected by the multiple microphones can be obtained, and the sound signals collected by each microphone can be synthesized and processed to obtain the sound signal to be detected. .
  • the sound signal to be detected when determining the sound source position of the sound signal to be detected, the sound signal to be detected can be decomposed to obtain multiple decomposed signals, where each decomposed signal can correspond to a microphone.
  • step 402 The following is a unified description of step 402 and step 403:
  • each of the above decomposed signals may correspond to a microphone. Since each microphone can locate the sound source direction of the sound signal when receiving the sound signal, the sound source direction of each decomposed signal can be determined through the microphone corresponding to each decomposed signal.
  • the sound source direction of each decomposed signal can be determined.
  • the intersection position of the plurality of sound source directions may be determined as the sound source position of the sound signal to be detected.
  • each microphone can correspond to a decomposed signal of the sound signal to be detected, and the sound source direction of the decomposed signal can be determined. In this way, four sound source directions can be obtained. The intersection position of the four sound source directions is point A, then point A can be determined as the sound source position of the sound signal to be detected.
  • multiple decomposed signals are obtained by decomposing the sound signal to be detected, and then the sound source direction of each decomposed signal is determined, and the intersection position of the multiple sound source directions is determined as the sound signal to be detected.
  • the sound source position enables more accurate positioning of the sound source position of the sound signal to be detected.
  • FIG. 9 is a flow chart of a living body detection method provided by an embodiment of the present disclosure. As shown in Figure 9, the process may include the following steps:
  • Step 901 Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the subject to be detected;
  • Step 902 Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
  • Step 903 Determine the reference space area based on the lip position
  • Step 904 Determine whether the sound source position is within the reference space area. If so, perform step 906; if not, perform step 905;
  • Step 905 Determine that the living body detection result of the object to be detected is non-living body
  • Step 906 Input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model, and obtain the output result of the above mouth shape recognition model;
  • Step 907 Determine whether the above output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. If yes, step 908 is executed; if not, step 905 is executed;
  • Step 908 Determine that the living body detection result of the object to be detected is a living body.
  • step 901 and step 902 please refer to the description of steps 301 to 302 above, and will not be described again here.
  • steps 903 to 905 The following is a unified description of steps 903 to 905:
  • a reference space area may first be determined based on the above-mentioned lip position.
  • a sphere or cuboid area can be set as the reference space area centered on the above-mentioned lip position. Then, it is determined whether the sound source position is within the reference space area. If so, it can be determined that the sound source position and the lip position are consistent; if not, it can be determined that the sound source position and the lip position are inconsistent.
  • steps 906 to 908 The following is a unified description of steps 906 to 908:
  • the image to be detected and the sound signal to be detected can be input to the above-trained mouth shape recognition model to obtain a signal indicating whether the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. Output results.
  • the output result is 1, it means that the above-mentioned sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected; if the output result is 0, it means that the above-mentioned sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. The mouth shape of the object to be detected does not match.
  • the living body detection result of the object to be detected is a living body.
  • the user 11 and the life detection device 13 are in the same environment. Then when the user 11, that is, the object to be detected emits an "ah" sound signal at a certain moment, the life detection device can directly use the camera to Collect the current time At this moment, the image containing the face of the object to be detected is obtained to obtain the image to be detected, and the "ah" sound signal is received through the microphone to obtain the sound signal to be detected.
  • the above-mentioned image to be detected and the sound signal to be detected are input to the above-mentioned mouth shape recognition model.
  • the mouth shape recognition model can obtain the mouth shape of the object to be detected as " "Ah" mouth shape indicates that the mouth shape of the object to be detected matches the sound signal to be detected, thus it can be determined that the object to be detected is a living body.
  • the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected.
  • the life detection device 13 can collect the image containing the video image of the user 12 through the camera. Assume that at a certain moment, user 11 sends an "ah" sound signal, but user 12 does not send any sound signal at the same moment.
  • the image to be detected collected by the living body detection device is the video image of user 12, and the collected The received sound signal to be detected is the sound signal emitted by user 11.
  • the above-mentioned sound signal to be detected and the image to be detected are input to the mouth shape recognition model. Since the user 12 does not send any sound signal at the current moment, the mouth shape of the object to be detected can be obtained by identifying the image to be detected. is a closed state, and the sound signal to be detected is an "ah" sound signal, the mouth shape corresponding to the "ah” sound signal should be in an open state. Therefore, the sound signal to be detected and the image to be detected can be obtained. The output result of the object's mouth shape does not match, thus obtaining that the object to be detected is an inanimate body.
  • the sound source position is located within the reference space area determined based on the lip position
  • further steps can be made based on the output results. Determine whether the living body detection result of the above-mentioned object to be detected is living body.
  • it is determined whether the object to be detected is a living body by further detecting whether the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected.
  • the object to be detected can still be identified as a non-living body, which improves the accuracy and reliability of the living body detection results.
  • Figure 10 is a flow chart of a living body detection method provided by an embodiment of the present disclosure. As shown in Figure 10, the process may include the following steps:
  • Step 1001. Output an interactive instruction, which is used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
  • Step 1002 Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected.
  • the image to be detected is an image containing the face of the object to be detected. picture;
  • Step 1003 Perform speech recognition on the sound signal to be detected, and obtain text data corresponding to the sound signal to be detected;
  • Step 1004 Compare the text data corresponding to the sound signal to be detected with the preset text data for consistency;
  • Step 1005 Determine whether the comparison result indicates that the recognized text data is consistent with the preset text data. If yes, execute step 1006; if not, execute step 1001;
  • Step 1006 Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
  • Step 1007 Determine whether the position of the sound source and the position of the lips are consistent. If yes, execute step 1009; if not, execute step 1008;
  • Step 1008 Determine that the living body detection result of the object to be detected is non-living body
  • Step 1009 Input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model, and obtain the output result of the above mouth shape recognition model;
  • Step 1010 Determine whether the above output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. If yes, step 1011 is executed; if not, step 1008 is executed;
  • Step 1011 Determine that the living body detection result of the object to be detected is a living body.
  • the above-mentioned interactive instructions can be generated by the living body detection device through the following method: first, calling a preset random number generation algorithm to generate a random array; then, generating the above-mentioned preset text data based on the above-mentioned random array, and based on the Preset text data to generate interactive instructions.
  • the living body detection device calls the preset random number generation algorithm to generate the following random array: 265910. Then, the living body detection device determines the random array as preset text data, generates an interactive instruction for instructing the subject to be detected to say 265910 based on the preset text data, and outputs the interactive instruction.
  • the life detection device can receive the sound signal through the microphone and determine the sound signal as the sound signal to be detected.
  • another object in the same environment as the life detection device as exemplified above can also speak the preset text data according to the interactive instruction, thereby generating a sound signal.
  • the life detection device can receive the sound signal through the microphone and determine the sound signal as the sound signal to be detected.
  • the living body detection device can perform speech recognition on the above-mentioned sound signal to be detected through ASR (Automatic Speech Recognition, automatic speech recognition technology) to obtain text data.
  • ASR Automatic Speech Recognition, automatic speech recognition technology
  • the life detection device can perform speech recognition on the above-mentioned sound signal to be detected through a convolutional neural network algorithm to obtain text data.
  • step 1001 when the life detection device issues an interactive instruction for instructing the object to be detected to say 265910, the object to be detected emits a corresponding sound signal according to the interaction instruction, and the life detection device receives the sound through the microphone. signal to obtain the sound signal to be detected. Afterwards, the life detection device performs speech recognition on the sound signal to be detected and obtains text data 265910.
  • steps 1004 to 1005 The following is a unified description of steps 1004 to 1005:
  • the text data corresponding to the sound signal to be detected can be compared with preset file data for consistency. Specifically, if the comparison result shows that the recognized text data is inconsistent with the preset text data, it means that the subject to be detected did not speak the preset text data according to the interactive instructions output by the life detection device. At this time, in order to avoid errors in the hearing of the object to be detected or errors in the recognition of the living body detection device, the living body detection device can regenerate the interactive command and output the interactive command.
  • step 1006 can be continued to further determine whether the object to be detected is a living body.
  • steps 1006 to 1008 please refer to the descriptions of steps 302 and 303 above, and will not be described again here.
  • steps 1009 to 1011 please refer to the description of steps 906 to 908 above, and will not be described again here.
  • an interactive instruction can be output to instruct the object to be detected to emit a sound signal corresponding to the preset text data, and after determining the sound signal to be detected, identify For the sound signal to be detected, the recognized text data is compared with the preset text data for consistency. If they are consistent, the step of determining the source location of the sound signal to be detected can be performed.
  • the object to be detected can be instructed to emit a sound signal corresponding to the preset text data through interactive instructions, and it can be initially determined whether the object to be detected can interact with the living body detection device, thereby avoiding the need for the object to be detected to use the device in advance.
  • Recorded video images are used for counterfeiting, so that when non-authenticated users imitate authenticated users by obtaining video images recorded in advance by authenticated users, they can more quickly and accurately identify the object to be detected as non-living, thus improving liveness detection. The accuracy and reliability of the results.
  • FIG. 11 a block diagram of a living body detection device 110 is provided according to an embodiment of the present disclosure. As shown in Figure 11, the device includes:
  • the first determination module 111 is used to determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the object to be detected;
  • the second determination module 112 is used to determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
  • the third determination module 113 is configured to compare the sound source position and the lip position for consistency, and determine the living body detection result of the object to be detected based on the comparison result.
  • the third determination module 113 is configured as:
  • the third determination module 113 is configured as:
  • the comparison result indicates that the sound source position and the lip position are inconsistent, it is determined that the living body detection result of the object to be detected is non-living body
  • the comparison result indicates that the sound source position and the lip position are consistent, it is determined that the living body detection result of the object to be detected is a living body.
  • the device further includes (not shown in the figure):
  • An input module configured to input the image to be detected and the sound signal to be detected to a trained mouth shape recognition model when the comparison result indicates that the sound source position and the lip position are consistent. , obtain the output result of the mouth shape recognition model;
  • a first execution module configured to perform the life detection of determining the object to be detected if the output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected.
  • the result is a step that is in vivo.
  • the device further includes (not shown in the figure):
  • An output module configured to output an interactive instruction before determining the sound signal to be detected, the interactive instruction being used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
  • a recognition module configured to perform speech recognition on the sound signal to be detected before determining the sound source position corresponding to the sound signal to be detected, and to obtain text data corresponding to the sound signal to be detected;
  • a comparison module for comparing the text data corresponding to the sound signal to be detected with the preset text data for consistency
  • the second execution module is configured to execute the step of determining the sound source position corresponding to the sound signal if the comparison result indicates that the text data corresponding to the sound signal to be detected is consistent with the preset text data.
  • the device further includes (not shown in the figure):
  • the third execution module is used to return to execute the output if the comparison result indicates that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data. Steps for interactive instructions.
  • the output module is configured to:
  • the preset text data is generated based on the random array, and the interactive instruction is generated based on the preset text data.
  • the first determining module 111 is configured as:
  • the sound signals collected by each of the microphones are synthesized and processed to obtain the sound signal to be detected.
  • the second determination module 112 is configured to:
  • intersection position of multiple sound source directions is determined as the sound source position of the sound signal to be detected.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 1200 shown in FIG. 12 includes: at least one processor 1201, a memory 1202, at least one network interface 1204, and a user interface 1203.
  • the various components in electronic device 1200 are coupled together through bus system 1205 .
  • the bus system 1205 is used to implement connection communication between these components.
  • the bus system 1205 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled bus system 1205 in FIG. 12 .
  • the user interface 1203 may include a display, a keyboard, or a clicking device (eg, a mouse, a trackball, a touch pad, a touch screen, etc.).
  • a clicking device eg, a mouse, a trackball, a touch pad, a touch screen, etc.
  • the memory 1202 in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDRSDRAM double data rate synchronous dynamic random access memory
  • Enhanced SDRAM, ESDRAM enhanced synchronous dynamic random access memory
  • Synchlink DRAM, SLDRAM synchronous link dynamic random access memory
  • DRRAM Direct Rambus RAM
  • memory 1202 stores the following elements, executable units or data structures, or a subset thereof, or an extension thereof: operating system 12021 and applications 12022.
  • the operating system 12021 includes various system programs, such as framework layer, core library layer, driver layer, etc., which are used to implement various basic services and process hardware-based tasks.
  • Application 12022 includes various applications, such as media player, browser, etc., and is used to implement various application services.
  • the program that implements the method of an embodiment of the present disclosure may be included in the application program 12022.
  • the processor 1201 by calling the program or instructions stored in the memory 1202, which in some embodiments may be the program or instructions stored in the application program 12022, the processor 1201 is used to execute the methods provided by each method embodiment.
  • Method steps include, for example:
  • the sound source position and the lip position are compared for consistency, and the living body detection result of the object to be detected is determined based on the comparison result.
  • the methods disclosed in the above embodiments of the present disclosure can be applied to the processor 1201 or implemented by the processor 1201.
  • the processor 1201 may be an integrated circuit chip and has signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1201 .
  • the above-mentioned processor 1201 can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present disclosure can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software units in the decoding processor.
  • the software unit can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory 1202.
  • the processor 1201 reads the information in the memory 1202 and completes the steps of the above method in combination with its hardware.
  • the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can Implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), Programmable Logic Device, PLD), field-programmable gate array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other electronic units used to perform the functions described in this application, or combinations thereof.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA field-programmable gate array
  • controller microcontroller
  • microprocessor other electronic units used to perform the functions described in this application, or combinations thereof.
  • the techniques described herein may be implemented by means of units that perform the functions described herein.
  • Software code may be stored in memory and executed by a processor.
  • the memory can be implemented in the processor or external to the processor.
  • the electronic device provided by the present disclosure can be the electronic device as shown in Figure 12, and can perform all the steps of the living body detection method in Figures 3-4 and Figures 9-10, thereby realizing the steps shown in Figures 3-4 and Figures 9-10. shows the technical effect of the living body detection method.
  • Figures 3 to 4 and Figures 9 to 10. This is a concise description and will not be repeated here.
  • Embodiments of the present disclosure also provide storage media (computer-readable storage media).
  • the storage medium here stores one or more programs.
  • the storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid state hard drive; the memory may also include the above types of memory. combination.
  • One or more programs in the storage medium can be executed by one or more processors to implement the above-mentioned life detection method executed on the electronic device side.
  • the processor is used to execute the life detection program stored in the memory to implement the following steps of the life detection method executed on the electronic device side:
  • the sound source position and the lip position are compared for consistency, and the living body detection result of the object to be detected is determined based on the comparison result.
  • Steps of methods or algorithms described in connection with the embodiments disclosed herein may be used Implemented in hardware, software modules executed by a processor, or a combination of the two.
  • Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

In order to accurately identify whether an object to be subjected to detection is a living body, the present invention relates to a living body detection method and apparatus, an electronic device and a storage medium. The method comprises: determining a sound signal to be subjected to detection and an image to be subjected to detection which corresponds to said sound signal, wherein said image is an image including the face of an object to be subjected to detection; determining a sound source position corresponding to said sound signal, and determining a lip position of said object on the basis of said image; and performing a consistency comparison on the sound source position and the lip position, and determining a living body detection result of said object according to the comparison result.

Description

活体检测方法、装置、电子设备及存储介质Living body detection methods, devices, electronic equipment and storage media
相关申请的引用References to related applications
本公开要求于2022年9月5日向中华人民共和国国家知识产权局提交的申请号为202211077263.3、名称为“活体检测方法、装置、电子设备及存储介质”的发明专利申请的全部权益,并通过引用的方式将其全部内容并入本文。This disclosure claims all rights and interests in the invention patent application titled "Living Body Detection Methods, Devices, Electronic Equipment and Storage Media" submitted to the State Intellectual Property Office of the People's Republic of China on September 5, 2022 with application number 202211077263.3, and is incorporated by reference to incorporate its entire contents into this article.
领域field
本公开涉及生物识别技术领域,具体涉及活体检测方法、装置、电子设备及存储介质。The present disclosure relates to the field of biometric identification technology, specifically to living body detection methods, devices, electronic equipment and storage media.
背景background
活体检测是在一些身份验证场景确定对象真实生理特征的方法,在人脸识别应用中,活体检测能通过眨眼、张嘴、摇头、点头等组合动作,使用人脸关键点定位和人脸追踪等生命体征检测技术,验证用户是否为真实活体本人操作技术,以抵御照片、换脸、面具、遮挡以及屏幕翻拍等常见的攻击手段。Liveness detection is a method to determine the true physiological characteristics of objects in some identity verification scenarios. In face recognition applications, liveness detection can use facial key point positioning and face tracking through combined actions such as blinking, opening mouth, shaking head, nodding, etc. Physical sign detection technology, which verifies whether the user is a real living person operating technology to resist common attack methods such as photos, face swaps, masks, occlusions, and screen remakes.
概述Overview
本公开提供活体检测方法、装置、电子设备及存储介质。The present disclosure provides living body detection methods, devices, electronic equipment and storage media.
第一方面,本公开提供活体检测方法,所述方法包括:In a first aspect, the present disclosure provides a living body detection method, which method includes:
确定待检测声音信号以及与待检测声音信号对应的待检测图像,待检测图像为包含待检测对象面部的图像;Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the object to be detected;
确定待检测声音信号对应的声源位置,以及基于待检测图像确定待检测对象的唇部位置;Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
将声源位置和唇部位置进行一致性比对,根据比对结果确定待检测对象的活体检测结果。The sound source position and the lip position are compared for consistency, and the liveness detection result of the object to be detected is determined based on the comparison results.
在某些实施方案中,将声源位置和唇部位置进行一致性比对,包括:In some embodiments, the sound source position and the lip position are compared for consistency, including:
基于唇部位置确定参考空间区域;Determine the reference space area based on the lip position;
确定声源位置是否位于参考空间区域内,Determine whether the sound source location is within the reference space area,
当声源位置位于参考空间区域内时,得到声源位置和唇部位置一致的比对结果,When the sound source position is within the reference space area, a comparison result is obtained in which the sound source position and the lip position are consistent.
当声源位置不位于参考空间区域内时,得到声源位置和唇部位置不一致的比对结果。When the sound source position is not located within the reference space area, a comparison result in which the sound source position and the lip position are inconsistent are obtained.
在某些实施方案中,根据比对结果确定待检测对象的活体检测结果,包括: In some embodiments, determining the vitality detection result of the subject to be detected based on the comparison results includes:
当比对结果表征声源位置和唇部位置不一致时,确定待检测对象的活体检测结果为非活体;When the comparison result indicates that the sound source position and the lip position are inconsistent, it is determined that the living body detection result of the object to be detected is non-living body;
当比对结果表征声源位置和唇部位置一致时,确定待检测对象的活体检测结果为活体。When the comparison result indicates that the sound source position and the lip position are consistent, the living body detection result of the object to be detected is determined to be alive.
在某些实施方案中,当比对结果表征声源位置和唇部位置一致时,所述方法还包括:In some embodiments, when the comparison result indicates that the sound source position and the lip position are consistent, the method further includes:
将待检测图像和待检测声音信号输入至训练好的口型识别模型,得到口型识别模型的输出结果;Input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model, and obtain the output result of the mouth shape recognition model;
当输出结果表示待检测声音信号和待检测图像中待检测对象的口型相匹配时,执行确定待检测对象的活体检测结果为活体的步骤。When the output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected, the step of determining that the living body detection result of the object to be detected is a living body is executed.
在某些实施方案中,在确定待检测声音信号之前,所述方法还包括:In some embodiments, before determining the sound signal to be detected, the method further includes:
输出交互指令,交互指令用于指示待检测对象发出与预设文本数据对应的声音信号;Output interactive instructions, which are used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
在确定待检测声音信号对应的声源位置之前,还包括:Before determining the sound source position corresponding to the sound signal to be detected, it also includes:
对待检测声音信号进行语音识别,得到待检测声音信号对应的文本数据;Perform speech recognition on the sound signal to be detected, and obtain the text data corresponding to the sound signal to be detected;
将待检测声音信号对应的文本数据与预设文本数据进行一致性比对;Compare the text data corresponding to the sound signal to be detected with the preset text data for consistency;
当比对结果表示待检测声音信号对应的文本数据与预设文本数据一致时,执行确定声音信号对应的声源位置的步骤。When the comparison result indicates that the text data corresponding to the sound signal to be detected is consistent with the preset text data, the step of determining the position of the sound source corresponding to the sound signal is performed.
在某些实施方案中,所述方法还包括:In certain embodiments, the method further includes:
当比对结果表示待检测声音信号对应的文本数据与预设文本数据不一致时,返回执行输出交互指令的步骤。When the comparison result indicates that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data, return to the step of outputting the interactive command.
在某些实施方案中,交互指令的生成过程包括:In some embodiments, the generation process of interactive instructions includes:
调用预设的随机数生成算法,生成随机数组;Call the preset random number generation algorithm to generate a random array;
基于随机数组生成预设文本数据,并根据预设文本数据生成交互指令。Generate preset text data based on a random array, and generate interactive instructions based on the preset text data.
在某些实施方案中,确定待检测声音信号,包括:In some embodiments, determining the sound signal to be detected includes:
获取多个麦克风采集到的声音信号;Obtain sound signals collected by multiple microphones;
将每个麦克风采集到的声音信号进行合成处理,得到待检测声音信号。The sound signals collected by each microphone are synthesized and processed to obtain the sound signal to be detected.
在某些实施方案中,确定待检测声音信号对应的声源位置,包括:In some embodiments, determining the sound source position corresponding to the sound signal to be detected includes:
对待检测声音信号进行分解,得到多个分解信号;Decompose the sound signal to be detected and obtain multiple decomposed signals;
确定每个分解信号的声源方向;Determine the sound source direction of each decomposed signal;
将多个声源方向的交点位置确定为待检测声音信号的声源位 置。Determine the intersection position of multiple sound source directions as the sound source position of the sound signal to be detected Set.
第二方面,本公开提供活体检测装置,所述装置包括:In a second aspect, the present disclosure provides a living body detection device, which device includes:
第一确定模块,配置为确定待检测声音信号以及与待检测声音信号对应的待检测图像,待检测图像为包含待检测对象面部的图像;The first determination module is configured to determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the subject to be detected;
第二确定模块,配置为确定待检测声音信号对应的声源位置,以及基于待检测图像确定待检测对象的唇部位置;a second determination module configured to determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
第三确定模块,配置为将声源位置和唇部位置进行一致性比对,根据比对结果确定待检测对象的活体检测结果。The third determination module is configured to compare the sound source position and the lip position for consistency, and determine the living body detection result of the object to be detected based on the comparison result.
在某些实施方案中,第三确定模块配置为:In some embodiments, the third determination module is configured as:
基于唇部位置确定参考空间区域;Determine the reference space area based on the lip position;
确定声源位置是否位于参考空间区域内;Determine whether the sound source location is within the reference space area;
若是,则得到声源位置和唇部位置一致的比对结果;If so, the comparison result is obtained that the sound source position and the lip position are consistent;
若否,则得到声源位置和唇部位置不一致的比对结果。If not, a comparison result will be obtained in which the sound source position and the lip position are inconsistent.
在某些实施方案中,第三确定模块配置为:In some embodiments, the third determination module is configured as:
若比对结果表征声源位置和唇部位置不一致,则确定待检测对象的活体检测结果为非活体;If the comparison result indicates that the sound source position and the lip position are inconsistent, it is determined that the living body detection result of the object to be detected is non-living body;
若比对结果表征声源位置和唇部位置一致,则确定待检测对象的活体检测结果为活体。If the comparison result shows that the position of the sound source and the position of the lips are consistent, the living body detection result of the object to be detected is determined to be alive.
在某些实施方案中,所述装置还包括:In certain embodiments, the device further includes:
输入模块,配置为在比对结果表征声源位置和唇部位置一致的情况下,将待检测图像和待检测声音信号输入至训练好的口型识别模型,得到口型识别模型的输出结果;The input module is configured to input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model to obtain the output result of the mouth shape recognition model when the comparison result indicates that the sound source position and the lip position are consistent;
第一执行模块,配置为若输出结果表示待检测声音信号和待检测图像中待检测对象的口型相匹配,则执行确定待检测对象的活体检测结果为活体的步骤。The first execution module is configured to execute the step of determining that the life detection result of the object to be detected is a living body if the output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected.
在某些实施方案中,所述装置还包括:In certain embodiments, the device further includes:
输出模块,配置为在确定待检测声音信号之前,输出交互指令,交互指令用于指示待检测对象发出与预设文本数据对应的声音信号;The output module is configured to output an interactive instruction before determining the sound signal to be detected, and the interactive instruction is used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
识别模块,配置为在确定待检测声音信号对应的声源位置之前,对待检测声音信号进行语音识别,得到待检测声音信号对应的文本数据;A recognition module configured to perform speech recognition on the sound signal to be detected and obtain text data corresponding to the sound signal to be detected before determining the sound source position corresponding to the sound signal to be detected;
比对模块,配置为将待检测声音信号对应的文本数据与预设文本数据进行一致性比对;A comparison module configured to compare the text data corresponding to the sound signal to be detected with the preset text data for consistency;
第二执行模块,配置为若比对结果表示待检测声音信号对应的文本数据与预设文本数据一致,则执行确定声音信号对应的声源位置的步骤。 The second execution module is configured to execute the step of determining the sound source position corresponding to the sound signal if the comparison result indicates that the text data corresponding to the sound signal to be detected is consistent with the preset text data.
在某些实施方案中,所述装置还包括:In certain embodiments, the device further includes:
第三执行模块,配置为若比对结果表示待检测声音信号对应的文本数据与预设文本数据不一致,则返回执行输出交互指令的步骤。The third execution module is configured to return to the step of outputting interactive instructions if the comparison result indicates that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data.
在某些实施方案中,输出模块配置为:In some embodiments, the output module is configured as:
调用预设的随机数生成算法,生成随机数组;Call the preset random number generation algorithm to generate a random array;
基于随机数组生成预设文本数据,并根据预设文本数据生成交互指令。Generate preset text data based on a random array, and generate interactive instructions based on the preset text data.
在某些实施方案中,第一确定模块配置为:In some embodiments, the first determining module is configured as:
获取多个麦克风采集到的声音信号;Obtain sound signals collected by multiple microphones;
将每个麦克风采集到的声音信号进行合成处理,得到待检测声音信号。The sound signals collected by each microphone are synthesized and processed to obtain the sound signal to be detected.
在某些实施方案中,第二确定模块配置为:In some embodiments, the second determination module is configured as:
对待检测声音信号进行分解,得到多个分解信号;Decompose the sound signal to be detected and obtain multiple decomposed signals;
确定每个分解信号的声源方向;Determine the sound source direction of each decomposed signal;
将多个声源方向的交点位置确定为待检测声音信号的声源位置。The intersection position of multiple sound source directions is determined as the sound source position of the sound signal to be detected.
第三方面,本公开提供电子设备,包括:处理器和存储器,处理器配置为执行存储器中存储的活体检测程序,以实现本公开所述的活体检测方法。In a third aspect, the present disclosure provides an electronic device, including: a processor and a memory. The processor is configured to execute a life detection program stored in the memory to implement the life detection method described in the present disclosure.
第四方面,本公开提供存储介质,存储介质存储有一个或者多个程序,一个或者多个程序能够被一个或者多个处理器执行,以实现本公开所述的活体检测方法。In a fourth aspect, the present disclosure provides a storage medium. The storage medium stores one or more programs. The one or more programs can be executed by one or more processors to implement the living body detection method described in the present disclosure.
在某些实施方案中,根据本公开的活体检测方法,可直接定位待检测声音信号的声源位置和待检测对象的唇部位置,并在确定声源位置和唇部位置一致时,说明待检测声音信号由待检测对象的唇部发出,该待检测对象为活体;否则,该待检测对象为非活体。实现了即使非认证用户通过获取认证用户视频图像的手段仿冒认证用户,也能够识别待检测对象为非活体,提高了活体检测结果的准确性、可靠性。In some embodiments, according to the living body detection method of the present disclosure, the sound source position of the sound signal to be detected and the lip position of the object to be detected can be directly located, and when it is determined that the sound source position and the lip position are consistent, it is stated that the sound source position to be detected is consistent. The detection sound signal is emitted from the lips of the object to be detected, and the object to be detected is a living body; otherwise, the object to be detected is a non-living body. It is realized that even if a non-authenticated user impersonates an authenticated user by obtaining the authenticated user's video image, the object to be detected can still be identified as a non-living body, which improves the accuracy and reliability of the living body detection results.
附图的简要说明Brief description of the drawings
图1为本公开一实施例涉及的应用场景示意图;Figure 1 is a schematic diagram of an application scenario involved in an embodiment of the present disclosure;
图2为本公开一实施例涉及的应用场景示意图;Figure 2 is a schematic diagram of an application scenario involved in an embodiment of the present disclosure;
图3为本公开一实施例提供的活体检测方法的流程图;Figure 3 is a flow chart of a living body detection method provided by an embodiment of the present disclosure;
图4为本公开一实施例提供的活体检测方法的流程图;Figure 4 is a flow chart of a living body detection method provided by an embodiment of the present disclosure;
图5为本公开一实施例提供的活体检测设备上麦克风分布图;Figure 5 is a distribution diagram of microphones on a living body detection device provided by an embodiment of the present disclosure;
图6为本公开一实施例提供的活体检测设备通过麦克风确定 声源位置的示意图;Figure 6 shows the life detection device provided by an embodiment of the present disclosure and determines through a microphone Schematic diagram of sound source location;
图7为本公开一实施例提供的声源位置示意图;Figure 7 is a schematic diagram of the sound source position provided by an embodiment of the present disclosure;
图8为本公开一实施例提供的声源位置示意图;Figure 8 is a schematic diagram of the sound source position provided by an embodiment of the present disclosure;
图9为本公开一实施例提供的活体检测方法的流程图;Figure 9 is a flow chart of a living body detection method provided by an embodiment of the present disclosure;
图10为本公开一实施例提供的活体检测方法的流程图;Figure 10 is a flow chart of a living body detection method provided by an embodiment of the present disclosure;
图11为本公开一实施例提供的活体检测装置的框图;Figure 11 is a block diagram of a living body detection device provided by an embodiment of the present disclosure;
图12为本公开一实施例提供的电子设备的结构示意图。FIG. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
详述Elaborate
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments are: Some, but not all, embodiments of this disclosure. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this disclosure.
参见图1,为本公开一实施例涉及的应用场景示意图。Refer to Figure 1, which is a schematic diagram of an application scenario involved in an embodiment of the present disclosure.
图1所示应用场景中包括:用户11与活体检测设备13。其中,活体检测设备13,可以是安装有活体检测系统的,支持具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便捷计算机、台式计算机等。该实施例中,上述显示屏可用来显示摄像头捕捉到的视频信号,以及提示待检测用户人脸位置。图1中以显示屏为例表示该活体检测设备13。The application scenario shown in Figure 1 includes: user 11 and life detection device 13. Among them, the living body detection device 13 can be installed with a living body detection system, and supports various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, etc. In this embodiment, the above-mentioned display screen can be used to display the video signal captured by the camera and prompt the position of the user's face to be detected. In FIG. 1 , a display screen is taken as an example to illustrate the living body detection device 13 .
在图1所示应用场景中,用户11可通过正常方式进行活体检测。这里的正常方式是指:用户11站于活体检测设备13的摄像头前,那么,活体检测设备13可以通过摄像头直接采集到包含该用户11面部的视频图像,进而基于该视频图像对用户11进行活体检测。In the application scenario shown in Figure 1, user 11 can perform life detection in a normal way. The normal method here means that the user 11 stands in front of the camera of the life detection device 13. Then, the life detection device 13 can directly collect the video image containing the face of the user 11 through the camera, and then perform the life test on the user 11 based on the video image. detection.
参见图2,为本公开一实施例涉及的应用场景示意图。该应用场景包括:用户11、用户12、活体检测设备13、终端14,以及终端15。其中,终端14和终端15可进行网络通信。Refer to Figure 2, which is a schematic diagram of an application scenario involved in an embodiment of the present disclosure. The application scenario includes: user 11, user 12, living body detection device 13, terminal 14, and terminal 15. Among them, terminal 14 and terminal 15 can perform network communication.
终端14和终端15,可以是支持网络连接从而提供各种网络服务的硬件设备或软件。当终端14和终端15为硬件时,其可以是支持具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机、台式计算机等,图2中仅以智能手机为例。当终端14和终端15为软件时,可以安装在上述所列举的电子设备中。在该实施例中,终端14和终端15通过分别安装相应的应用程序来建立视频通话。Terminal 14 and terminal 15 may be hardware devices or software that support network connections to provide various network services. When the terminals 14 and 15 are hardware, they may support various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, etc. In Figure 2, only smart phones are used For example. When the terminal 14 and the terminal 15 are software, they can be installed in the electronic devices listed above. In this embodiment, the terminal 14 and the terminal 15 establish a video call by respectively installing corresponding applications.
在图2所示应用场景中,假设用户11为非认证用户,用户12 为认证用户。那么,当用户11想要通过活体检测设备13的活体检测时,可以通过终端14和终端15与用户12进行视频通话,或者是通过终端14播放预先录制的,包含用户12面部的视频图像。这时,终端14的显示屏幕上可显示用户12的视频图像。In the application scenario shown in Figure 2, assuming that user 11 is a non-authenticated user, user 12 For authenticated users. Then, when user 11 wants to pass the life detection device 13, he can make a video call with user 12 through terminal 14 and terminal 15, or play a pre-recorded video image including user 12's face through terminal 14. At this time, the video image of the user 12 can be displayed on the display screen of the terminal 14 .
在某些实施方案中,用户11可将终端14的显示屏幕朝向活体检测设备13的摄像头,那么活体检测设备13可通过摄像头接收到用户12的视频图像。由于活体检测设备13能够从用户12的视频图像中检测到生命体征,从而通过活体检测。由此可见,在现有技术中,非认证用户可通过获取认证用户视频图像的手段仿冒认证用户,通过活体检测。In some embodiments, the user 11 can face the display screen of the terminal 14 towards the camera of the life detection device 13, and then the life detection device 13 can receive the video image of the user 12 through the camera. Since the life detection device 13 can detect the vital signs from the video image of the user 12, it passes the life detection. It can be seen that in the existing technology, a non-authenticated user can impersonate the authenticated user by obtaining the authenticated user's video image and pass the liveness detection.
基于此,本公开一实施例提供活体检测方法,以避免非认证用户通过获取认证用户视频图像的手段仿冒认证用户时能够通过活体检测,提高活体检测结果的准确性。Based on this, an embodiment of the present disclosure provides a life detection method to prevent non-authenticated users from impersonating authenticated users by obtaining video images of authenticated users and improve the accuracy of life detection results.
下面结合附图以具体实施例对本公开提供的活体检测方法做进一步的解释说明,该实施例并不构成对本公开实施例的限定。The living body detection method provided by the present disclosure will be further explained below with reference to the accompanying drawings using specific embodiments. This embodiment does not constitute a limitation to the embodiments of the present disclosure.
参见图3,为本公开一实施例提供的活体检测方法的流程图。在某些实施方案中,图3所示流程可应用于活体检测设备,例如图1所示的活体检测设备13。如图3所示,该流程可包括以下步骤:Refer to Figure 3, which is a flow chart of a living body detection method provided by an embodiment of the present disclosure. In some embodiments, the process shown in Figure 3 can be applied to a living body detection device, such as the living body detection device 13 shown in Figure 1 . As shown in Figure 3, the process may include the following steps:
步骤301、确定待检测声音信号以及与上述待检测声音信号对应的待检测图像,上述待检测图像为包含待检测对象面部的图像;Step 301: Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the subject to be detected;
步骤302、确定待检测声音信号对应的声源位置,以及基于待检测图像确定待检测对象的唇部位置;Step 302: Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
步骤303、将声源位置和唇部位置进行一致性比对,根据比对结果确定待检测对象的活体检测结果。Step 303: Compare the sound source position and the lip position for consistency, and determine the living body detection result of the object to be detected based on the comparison result.
该实施例中,上述待检测声音信号为活体检测设备在进行活体检测时,通过麦克风接收到的声音信号。上述待检测图像为活体检测设备在进行活体检测时,通过摄像头采集到的,包含待检测对象面部的图像。待检测图像的数量可以为一张或多张。当待检测图像为多张时,该多张待检测图像可以指活体检测设备通过摄像头采集到的一段视频中的多张图像。In this embodiment, the above-mentioned sound signal to be detected is a sound signal received by the microphone when the life detection device performs life detection. The above-mentioned image to be detected is an image collected by the camera when the life detection device performs life detection, and contains the face of the subject to be detected. The number of images to be detected can be one or multiple. When there are multiple images to be detected, the multiple images to be detected may refer to multiple images in a video collected by the life detection device through a camera.
由于待检测声音信号和待检测图像均为活体检测设备在进行活体检测时获取的,因此,可称待检测声音信号和待检测图像相对应。Since both the sound signal to be detected and the image to be detected are obtained by the life detection device when performing life detection, it can be said that the sound signal to be detected and the image to be detected correspond to each other.
在一示例性应用场景中,待检测对象为真实对象。例如,在上述图1所示的应用场景中,用户11为待检测对象,此时,活体检测设备13可以通过摄像头直接采集到包含用户11面部的图像,得到待检测图像。同时,用户11可直接发出声音信号,活体检测设备通过麦克风接收到该声音信号,如此则实现了活体检测设备 确定待检测声音信号。In an exemplary application scenario, the object to be detected is a real object. For example, in the application scenario shown in Figure 1 above, the user 11 is the object to be detected. At this time, the life detection device 13 can directly collect an image containing the face of the user 11 through the camera to obtain the image to be detected. At the same time, the user 11 can directly send out a sound signal, and the living body detection device receives the sound signal through the microphone. In this way, the living body detection device Determine the sound signal to be detected.
在另一示例性应用场景中,待检测对象为虚拟对象。例如,在上述图2所示的应用场景中,终端14显示屏幕上输出的,用户12的视频图像为待检测对象,此时,活体检测设备13可通过摄像头采集到包含用户12面部的图像。同时,用户12可发出声音信号,该声音信号由终端15采集并发送给该终端14,终端14可通过喇叭播放该声音信号,如此,活体检测设备13可通过麦克风接收到待检测声音信号。或者,用户11可发出声音信号,活体检测设备13可通过麦克风接收待检测声音信号。In another exemplary application scenario, the object to be detected is a virtual object. For example, in the application scenario shown in FIG. 2 above, the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected. At this time, the life detection device 13 can collect the image including the face of the user 12 through the camera. At the same time, the user 12 can send out a sound signal, which is collected by the terminal 15 and sent to the terminal 14. The terminal 14 can play the sound signal through the speaker. In this way, the life detection device 13 can receive the sound signal to be detected through the microphone. Alternatively, the user 11 can send out a sound signal, and the life detection device 13 can receive the sound signal to be detected through a microphone.
在某些实施方案中,上述活体检测设备上可设置有多个麦克风,该多个麦克风通常设置于不同位置,例如,活体检测设备上设置有四个麦克风,该四个麦克风设置于活体检测设备的四个角上。在活体检测设备确定待检测声音信号时,可获取上述多个麦克风采集到的声音信号,得到多个声音信号。In some embodiments, the above-mentioned life detection device may be provided with multiple microphones, and the multiple microphones are usually provided at different locations. For example, the life detection device is provided with four microphones, and the four microphones are provided at the life detection device. on the four corners. When the life detection device determines the sound signal to be detected, the sound signals collected by the multiple microphones can be obtained to obtain multiple sound signals.
在某些实施方案中,活体检测设备可将每个麦克风采集到的声音信号进行合成处理,并将合成处理后的声音信号确定为待检测声音信号。如此,可以消除杂音,得到更加清楚且准确的声音信号。In some embodiments, the living body detection device can synthesize and process the sound signals collected by each microphone, and determine the synthesized and processed sound signals as the sound signals to be detected. In this way, noise can be eliminated and a clearer and more accurate sound signal can be obtained.
在某些实施方案中,活体检测设备可获取上述任一个麦克风的声音信号,并将上述获取的声音信号作为待检测声音信号。In some embodiments, the living body detection device can acquire the sound signal from any of the above microphones, and use the above acquired sound signal as the sound signal to be detected.
以下对步骤302和步骤303进行统一说明:The following is a unified description of step 302 and step 303:
在某些实施方案中,活体检测设备可定位上述待检测声音信号对应的声源位置,具体是如何定位的,在下文中通过图4所示流程进行说明,这里先不详述;In some embodiments, the living body detection device can locate the sound source position corresponding to the sound signal to be detected. How to locate the sound source will be explained below through the process shown in Figure 4, which will not be described in detail here;
在某些实施方案中,上述待检测图像中包含待检测对象的面部,基于此,活体检测设备可通过对待检测图像进行面部识别确定待检测对象的唇部位置;In some embodiments, the above-mentioned image to be detected contains the face of the subject to be detected. Based on this, the life detection device can determine the lip position of the subject to be detected by performing facial recognition on the image to be detected;
在某些实施方案中,可通过将声源位置和唇部位置进行一致性比对,并根据比对结果确定待检测对象的活体检测结果。具体地,当上述比对结果表征声源位置和唇部位置不一致时,则表征上述待检测声音信号不是由待检测对象的唇部发出,说明待检测对象的活体检测结果为非活体。In some embodiments, the sound source position and the lip position can be compared for consistency, and the living body detection result of the object to be detected can be determined based on the comparison result. Specifically, when the comparison result indicates that the sound source position and the lip position are inconsistent, it indicates that the sound signal to be detected is not emitted from the lips of the subject to be detected, indicating that the living body detection result of the subject to be detected is non-living.
在一示例性应用场景中,假设待检测对象为虚拟对象,例如,在上述图2所示的应用场景中,终端14显示屏幕上输出的用户12的视频图像为待检测对象,此时,活体检测设备13可通过摄像头采集到包含用户12视频图像的图像。In an exemplary application scenario, it is assumed that the object to be detected is a virtual object. For example, in the application scenario shown in FIG. 2 above, the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected. At this time, the living body The detection device 13 can collect images including the video image of the user 12 through a camera.
同时,用户12可发出声音信号,该声音信号由终端15采集并发送给该终端14,终端14可通过喇叭播放该声音信号,而终端 14的喇叭可能位于终端14壳体的上方、下方、左方,或者右方,也即,该声音信号的声源位置可能位于终端14壳体的上方、下方、左方,或者右方。如图7所示,声源位置可能位于A、B、C、D,或者E区域中的任一区域内,这就与待检测对象的唇部位置不一致。或者,用户11可发出声音信号,那么此时的声源位置在用户11所在位置处,与待检测对象的唇部位置也不一致。此时,可确定待检测对象的活体检测结果为非活体。At the same time, the user 12 can send out a sound signal, which is collected by the terminal 15 and sent to the terminal 14. The terminal 14 can play the sound signal through the speaker, and the terminal The speaker 14 may be located above, below, left, or right of the terminal 14 housing. That is, the sound source position of the sound signal may be located above, below, left, or right of the terminal 14 housing. As shown in Figure 7, the sound source position may be located in any of the areas A, B, C, D, or E, which is inconsistent with the lip position of the object to be detected. Alternatively, the user 11 can send out a sound signal, and then the sound source position at this time is at the position of the user 11 and is inconsistent with the lip position of the object to be detected. At this time, it can be determined that the living body detection result of the object to be detected is non-living body.
相反的,当上述比对结果表征声源位置和唇部位置一致时,则表征上述待检测声音信号是由待检测对象的唇部发出,说明待检测对象的活体检测结果为活体。On the contrary, when the comparison result indicates that the sound source position and the lip position are consistent, it indicates that the sound signal to be detected is emitted from the lips of the subject to be detected, indicating that the living body detection result of the subject to be detected is a living body.
在另一示例性应用场景中,待检测对象为真实对象。例如,在上述图1所示的应用场景中,用户11为待检测对象,此时,活体检测设备13可以通过摄像头直接采集到包含用户11面部的图像,得到待检测图像。同时,用户11可直接发出声音信号,活体检测设备通过麦克风接收到该声音信号,如此则实现了活体检测设备确定待检测声音信号。In another exemplary application scenario, the object to be detected is a real object. For example, in the application scenario shown in Figure 1 above, the user 11 is the object to be detected. At this time, the life detection device 13 can directly collect an image containing the face of the user 11 through the camera to obtain the image to be detected. At the same time, the user 11 can directly send out a sound signal, and the life detection device receives the sound signal through the microphone. In this way, the life detection device determines the sound signal to be detected.
由于该待检测声音信号由待检测对象直接通过唇部发出,如此,可得到如图8所示的声源位置,如图8所示待检测声音信号的声源位置位于F区域内,已知该F区域为待检测对象的唇部区域,这就与待检测对象的唇部位置一致,此时,可确定待检测对象的活体检测结果为活体。Since the sound signal to be detected is emitted directly through the lips of the object to be detected, the sound source position as shown in Figure 8 can be obtained. As shown in Figure 8, the sound source position of the sound signal to be detected is located in the F area. It is known that The F area is the lip area of the subject to be detected, which is consistent with the position of the lips of the subject to be detected. At this time, it can be determined that the living body detection result of the subject to be detected is a living body.
在某些实施方案中,通过确定待检测声音信号以及与上述待检测声音信号对应的待检测图像,上述待检测图像为包含待检测对象面部的图像,之后,确定待检测声音信号的声源位置,以及基于待检测图像确定待检测对象的唇部位置,将声源位置和唇部位置进行一致性比对,并根据比对结果确定待检测对象的活体检测结果。在这些实施方案中,可直接定位待检测声音信号的声源位置和待检测对象的唇部位置,并在确定声源位置和唇部位置一致时,说明待检测声音信号由待检测对象的唇部发出,该待检测对象为活体;否则,该待检测对象为非活体。实现了即使非认证用户通过获取认证用户视频图像的手段仿冒认证用户,也能够识别待检测对象为非活体,提高了活体检测结果的准确性、可靠性。In some embodiments, by determining the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, the image to be detected is an image containing the face of the object to be detected, and then, the sound source position of the sound signal to be detected is determined. , and determine the lip position of the object to be detected based on the image to be detected, compare the sound source position and the lip position for consistency, and determine the living detection result of the object to be detected based on the comparison results. In these embodiments, the sound source position of the sound signal to be detected and the lip position of the object to be detected can be directly located, and when it is determined that the sound source position and the lip position are consistent, it means that the sound signal to be detected is generated by the lips of the object to be detected. If the object is not sent out, the object to be detected is a living body; otherwise, the object to be detected is a non-living body. It is realized that even if a non-authenticated user impersonates an authenticated user by obtaining the authenticated user's video image, the object to be detected can still be identified as a non-living body, which improves the accuracy and reliability of the living body detection results.
参见图4,为本公开一实施例提供的活体检测方法的流程图。图4所示流程在上述图3所示流程的基础上,描述活体检测设备是如何定位待检测声音信号的声源位置的,如图4所示,该流程可包括以下步骤:Refer to Figure 4, which is a flow chart of a living body detection method provided by an embodiment of the present disclosure. The process shown in Figure 4 is based on the process shown in Figure 3 above, describing how the living body detection device locates the sound source position of the sound signal to be detected. As shown in Figure 4, the process may include the following steps:
步骤401、对待检测声音信号进行分解,得到多个分解信号;Step 401: Decompose the sound signal to be detected to obtain multiple decomposed signals;
步骤402、确定每个分解信号的声源方向; Step 402: Determine the sound source direction of each decomposed signal;
步骤403、将多个声源方向的交点位置确定为待检测声音信号的声源位置。Step 403: Determine the intersection position of multiple sound source directions as the sound source position of the sound signal to be detected.
上述分解信号可为待检测声音信号包含的不同方向的声音信号中,任一方向的声音信号。The above-described decomposed signal may be a sound signal in any direction among the sound signals in different directions contained in the sound signal to be detected.
在某些实施方案中,为了更加准确地定位待检测声音信号的声源位置,活体检测设备可采用麦克风阵列信号处理技术来确定待检测声音信号的声源位置。In some embodiments, in order to more accurately locate the sound source position of the sound signal to be detected, the life detection device may use microphone array signal processing technology to determine the sound source position of the sound signal to be detected.
基于此,上述活体检测设备上可设置有N个麦克风。这里,为了更加准确地确定待检测声音信号在三维空间的声源位置,此处N可大于等于3。如图5所示,为本公开一实施例提供的活体检测设备上麦克风分布图。由图5可知,该检测设备上设置有4个麦克风,每个角上设置有一个麦克风,分别为麦克风1、麦克风2、麦克风3,以及麦克风4。Based on this, N microphones may be provided on the above-mentioned living body detection device. Here, in order to more accurately determine the sound source position of the sound signal to be detected in the three-dimensional space, N may be greater than or equal to 3. As shown in FIG. 5 , it is a distribution diagram of microphones on a life detection device provided by an embodiment of the present disclosure. As can be seen from Figure 5, the detection equipment is equipped with four microphones, one microphone is provided at each corner, namely microphone 1, microphone 2, microphone 3, and microphone 4.
由上述步骤301的描述可知,当活体检测设备上设置有多个麦克风时,可获取多个麦克风采集到的声音信号,并将每个麦克风采集到的声音信号进行合成处理,得到待检测声音信号。It can be seen from the description of step 301 above that when multiple microphones are provided on the life detection device, the sound signals collected by the multiple microphones can be obtained, and the sound signals collected by each microphone can be synthesized and processed to obtain the sound signal to be detected. .
基于此,在确定待检测声音信号的声源位置时,可将待检测声音信号进行分解,得到多个分解信号,其中,每个分解信号可对应一个麦克风。Based on this, when determining the sound source position of the sound signal to be detected, the sound signal to be detected can be decomposed to obtain multiple decomposed signals, where each decomposed signal can correspond to a microphone.
以下对步骤402和步骤403进行统一说明:The following is a unified description of step 402 and step 403:
由上述步骤401的描述可知,上述每个分解信号可对应一个麦克风。由于每个麦克风在接收到声音信号时,可定位该声音信号的声源方向,因此,通过每个分解信号所对应的麦克风可确定每个分解信号的声源方向。It can be known from the description of the above step 401 that each of the above decomposed signals may correspond to a microphone. Since each microphone can locate the sound source direction of the sound signal when receiving the sound signal, the sound source direction of each decomposed signal can be determined through the microphone corresponding to each decomposed signal.
基于此,在将待检测声音信号分解为多个分解信号后,可确定每个分解信号的声源方向。在某些实施方案中,可将上述多个声源方向的交点位置确定为上述待检测声音信号的声源位置。Based on this, after decomposing the sound signal to be detected into multiple decomposed signals, the sound source direction of each decomposed signal can be determined. In some embodiments, the intersection position of the plurality of sound source directions may be determined as the sound source position of the sound signal to be detected.
假设活体检测设备中的四个角的位置处分别设置有一个麦克风。如图6所示,每个麦克风可对应一个待检测声音信号的分解信号,并可确定该分解信号的声源方向,如此,可得到四个声源方向。该四个声源方向的交叉位置为A点,那么可将A点确定为待检测声音信号的声源位置。It is assumed that a microphone is provided at each of the four corners of the living body detection device. As shown in Figure 6, each microphone can correspond to a decomposed signal of the sound signal to be detected, and the sound source direction of the decomposed signal can be determined. In this way, four sound source directions can be obtained. The intersection position of the four sound source directions is point A, then point A can be determined as the sound source position of the sound signal to be detected.
在某些实施方案中,通过对待检测声音信号进行分解,得到多个分解信号,之后,确定每个分解信号的声源方向,并将多个声源方向的交点位置确定为待检测声音信号的声源位置,实现了更加准确地定位待检测声音信号的声源位置。In some embodiments, multiple decomposed signals are obtained by decomposing the sound signal to be detected, and then the sound source direction of each decomposed signal is determined, and the intersection position of the multiple sound source directions is determined as the sound signal to be detected. The sound source position enables more accurate positioning of the sound source position of the sound signal to be detected.
参见图9,为本公开一实施例提供的活体检测方法的流程图。如图9所示,该流程可包括以下步骤: Refer to Figure 9, which is a flow chart of a living body detection method provided by an embodiment of the present disclosure. As shown in Figure 9, the process may include the following steps:
步骤901、确定待检测声音信号以及与上述待检测声音信号对应的待检测图像,上述待检测图像为包含待检测对象面部的图像;Step 901: Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the subject to be detected;
步骤902、确定待检测声音信号对应的声源位置,以及基于待检测图像确定待检测对象的唇部位置;Step 902: Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
步骤903、基于唇部位置确定参考空间区域;Step 903: Determine the reference space area based on the lip position;
步骤904、确定声源位置是否位于参考空间区域内,若是,则执行步骤906;若否,则执行步骤905;Step 904: Determine whether the sound source position is within the reference space area. If so, perform step 906; if not, perform step 905;
步骤905、确定待检测对象的活体检测结果为非活体;Step 905: Determine that the living body detection result of the object to be detected is non-living body;
步骤906、将待检测图像和待检测声音信号输入至训练好的口型识别模型,得到上述口型识别模型的输出结果;Step 906: Input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model, and obtain the output result of the above mouth shape recognition model;
步骤907、判断上述输出结果是否表示待检测声音信号和待检测图像中待检测对象的口型相匹配,若是,则执行步骤908;若否,则执行步骤905;Step 907: Determine whether the above output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. If yes, step 908 is executed; if not, step 905 is executed;
步骤908、确定待检测对象的活体检测结果为活体。Step 908: Determine that the living body detection result of the object to be detected is a living body.
步骤901和步骤902的详细描述可参见上述步骤301至302中的描述,这里不再赘述。For detailed descriptions of step 901 and step 902, please refer to the description of steps 301 to 302 above, and will not be described again here.
以下对步骤903至步骤905进行统一说明:The following is a unified description of steps 903 to 905:
在该实施例中,可首先根据上述唇部位置确定一个参考空间区域。在某些实施方案中,可以上述唇部位置为中心,设置一个球体或者长方体区域作为参考空间区域。然后,确定上述声源位置是否位于上述参考空间区域内,若是,则可确定声源位置和唇部位置一致;若否,则可确定声源位置和唇部位置不一致。In this embodiment, a reference space area may first be determined based on the above-mentioned lip position. In some embodiments, a sphere or cuboid area can be set as the reference space area centered on the above-mentioned lip position. Then, it is determined whether the sound source position is within the reference space area. If so, it can be determined that the sound source position and the lip position are consistent; if not, it can be determined that the sound source position and the lip position are inconsistent.
通过该种处理,可以排除定位声源位置时存在的误差,或者待检测对象在发出声音信号时由于坐姿的改变使得定位唇部位置不准确的情况,从而使一致性比对结果更加准确。Through this kind of processing, errors in locating the sound source position can be eliminated, or the position of the lip position of the object to be detected is inaccurate due to changes in sitting posture when emitting a sound signal, thereby making the consistency comparison result more accurate.
以下对步骤906至步骤908进行统一说明:The following is a unified description of steps 906 to 908:
在该实施例中,可将待检测图像和待检测声音信号输入至上述训练好的口型识别模型,得到用于指示待检测声音信号和待检测图像中待检测对象的口型是否相匹配的输出结果。In this embodiment, the image to be detected and the sound signal to be detected can be input to the above-trained mouth shape recognition model to obtain a signal indicating whether the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. Output results.
在某些实施方案中,若输出结果为1,表示上述待检测声音信号与待检测图像中待检测对象的口型相匹配;若输出结果为0,表示上述待检测声音信号与待检测图像中待检测对象的口型不匹配。In some embodiments, if the output result is 1, it means that the above-mentioned sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected; if the output result is 0, it means that the above-mentioned sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. The mouth shape of the object to be detected does not match.
在该实施例中,在上述输出结果表示待检测声音信号和待检测图像中待检测对象的口型相匹配时,可确定待检测对象的活体检测结果为活体。In this embodiment, when the above output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected, it can be determined that the living body detection result of the object to be detected is a living body.
在一示例性应用场景中,用户11与活体检测设备13处于同一环境中,那么当用户11,也即待检测对象在某一时刻发出“啊”的声音信号时,活体检测设备可以通过摄像头直接采集到当前时 刻,包含待检测对象面部的图像,得到待检测图像,以及通过麦克风接收到该“啊”的声音信号,得到待检测声音信号。In an exemplary application scenario, the user 11 and the life detection device 13 are in the same environment. Then when the user 11, that is, the object to be detected emits an "ah" sound signal at a certain moment, the life detection device can directly use the camera to Collect the current time At this moment, the image containing the face of the object to be detected is obtained to obtain the image to be detected, and the "ah" sound signal is received through the microphone to obtain the sound signal to be detected.
基于此,将上述待检测图像和待检测声音信号输入至上述口型识别模型,该口型识别模型通过对待检测图像中待检测对象的口型进行识别,可得到待检测对象的口型为“啊”的口型,说明待检测对象的口型与待检测声音信号相匹配,从而可确定待检测对象为活体。Based on this, the above-mentioned image to be detected and the sound signal to be detected are input to the above-mentioned mouth shape recognition model. By identifying the mouth shape of the object to be detected in the image to be detected, the mouth shape recognition model can obtain the mouth shape of the object to be detected as " "Ah" mouth shape indicates that the mouth shape of the object to be detected matches the sound signal to be detected, thus it can be determined that the object to be detected is a living body.
相反的,在上述输出结果表示待检测声音信号和待检测图像中待检测对象的口型不匹配时,可确定待检测对象的活体检测结果为非活体。On the contrary, when the above output result indicates that the sound signal to be detected does not match the mouth shape of the object to be detected in the image to be detected, it can be determined that the living body detection result of the object to be detected is non-living.
在上述图2所示的应用场景中,终端14显示屏幕上输出的,用户12的视频图像为待检测对象,此时,活体检测设备13可通过摄像头采集到包含用户12视频图像的图像。假设在某一时刻,用户11发出“啊”的声音信号,而用户12在同一时刻并未发出任何声音信号,此时,活体检测设备采集到的待检测图像为用户12的视频图像,而采集到的待检测声音信号则为用户11发出的声音信号。In the application scenario shown in Figure 2 above, the video image of the user 12 output on the display screen of the terminal 14 is the object to be detected. At this time, the life detection device 13 can collect the image containing the video image of the user 12 through the camera. Assume that at a certain moment, user 11 sends an "ah" sound signal, but user 12 does not send any sound signal at the same moment. At this time, the image to be detected collected by the living body detection device is the video image of user 12, and the collected The received sound signal to be detected is the sound signal emitted by user 11.
之后,将上述待检测声音信号和待检测图像输入至口型识别模型,由于,在当前时刻,用户12并未发出任何声音信号,因而,通过识别待检测图像,可得到待检测对象的口型为闭合状态,而待检测声音信号为“啊”的声音信号,该“啊”的声音信号所对应的口型应为张开状态,因此,可得到待检测声音信号和待检测图像中待检测对象的口型不匹配的输出结果,从而得到待检测对象为非活体。After that, the above-mentioned sound signal to be detected and the image to be detected are input to the mouth shape recognition model. Since the user 12 does not send any sound signal at the current moment, the mouth shape of the object to be detected can be obtained by identifying the image to be detected. is a closed state, and the sound signal to be detected is an "ah" sound signal, the mouth shape corresponding to the "ah" sound signal should be in an open state. Therefore, the sound signal to be detected and the image to be detected can be obtained. The output result of the object's mouth shape does not match, thus obtaining that the object to be detected is an inanimate body.
在某些实施方案中,在确定声源位置位于基于唇部位置确定的参考空间区域内后,通过将待检测声音信号和待检测图像输入至训练好的口型识别模型,根据输出结果可进一步确定上述待检测对象的活体检测结果是否为活体。在这些实施方案中,通过进一步检测待检测声音信号和待检测图像中待检测对象的口型是否相匹配来确定待检测对象是否为活体。实现了即使非认证用户通过获取认证用户视频图像的手段仿冒认证用户,也能够识别待检测对象为非活体,提高了活体检测结果的准确性、可靠性。In some embodiments, after it is determined that the sound source position is located within the reference space area determined based on the lip position, by inputting the sound signal to be detected and the image to be detected into the trained mouth shape recognition model, further steps can be made based on the output results. Determine whether the living body detection result of the above-mentioned object to be detected is living body. In these embodiments, it is determined whether the object to be detected is a living body by further detecting whether the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. It is realized that even if a non-authenticated user impersonates an authenticated user by obtaining the authenticated user's video image, the object to be detected can still be identified as a non-living body, which improves the accuracy and reliability of the living body detection results.
参见图10,为本公开一实施例提供的活体检测方法的流程图。如图10所示,该流程可包括以下步骤:Refer to Figure 10, which is a flow chart of a living body detection method provided by an embodiment of the present disclosure. As shown in Figure 10, the process may include the following steps:
步骤1001、输出交互指令,该交互指令用于指示待检测对象发出与预设文本数据对应的声音信号;Step 1001. Output an interactive instruction, which is used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
步骤1002、确定待检测声音信号以及与上述待检测声音信号对应的待检测图像,上述待检测图像为包含待检测对象面部的图 像;Step 1002: Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected. The image to be detected is an image containing the face of the object to be detected. picture;
步骤1003、对待检测声音信号进行语音识别,得到待检测声音信号对应的文本数据;Step 1003: Perform speech recognition on the sound signal to be detected, and obtain text data corresponding to the sound signal to be detected;
步骤1004、将上述待检测声音信号对应的文本数据与预设的文本数据进行一致性比对;Step 1004: Compare the text data corresponding to the sound signal to be detected with the preset text data for consistency;
步骤1005、判断比对结果是否表示识别得到的文本数据与预设的文本数据一致,若是,则执行步骤1006;若否,则执行步骤1001;Step 1005: Determine whether the comparison result indicates that the recognized text data is consistent with the preset text data. If yes, execute step 1006; if not, execute step 1001;
步骤1006、确定待检测声音信号对应的声源位置,以及基于待检测图像确定待检测对象的唇部位置;Step 1006: Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
步骤1007、判断上述声源位置和上述唇部位置是否一致,若是,则执行步骤1009;若否,则执行步骤1008;Step 1007: Determine whether the position of the sound source and the position of the lips are consistent. If yes, execute step 1009; if not, execute step 1008;
步骤1008、确定待检测对象的活体检测结果为非活体;Step 1008: Determine that the living body detection result of the object to be detected is non-living body;
步骤1009、将待检测图像和待检测声音信号输入至训练好的口型识别模型,得到上述口型识别模型的输出结果;Step 1009: Input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model, and obtain the output result of the above mouth shape recognition model;
步骤1010、判断上述输出结果是否表示待检测声音信号和待检测图像中待检测对象的口型相匹配,若是,则执行步骤1011;若否,则执行步骤1008;Step 1010: Determine whether the above output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. If yes, step 1011 is executed; if not, step 1008 is executed;
步骤1011、确定待检测对象的活体检测结果为活体。Step 1011: Determine that the living body detection result of the object to be detected is a living body.
在某些实施方案中,上述交互指令可由活体检测设备通过以下方法生成:首先,调用预设的随机数生成算法,生成随机数组;然后,基于上述随机数组生成上述预设文本数据,并根据该预设文本数据生成交互指令。In some embodiments, the above-mentioned interactive instructions can be generated by the living body detection device through the following method: first, calling a preset random number generation algorithm to generate a random array; then, generating the above-mentioned preset text data based on the above-mentioned random array, and based on the Preset text data to generate interactive instructions.
例如,活体检测设备调用预设的随机数生成算法,生成以下随机数组:265910。然后,活体检测设备将该随机数组确定为预设文本数据,根据该预设文本数据生成一个用于指示待检测对象说出265910的交互指令,并输出该交互指令。For example, the living body detection device calls the preset random number generation algorithm to generate the following random array: 265910. Then, the living body detection device determines the random array as preset text data, generates an interactive instruction for instructing the subject to be detected to say 265910 based on the preset text data, and outputs the interactive instruction.
正常情况下,待检测对象接收到上述交互指令后,将按照该交互指令说出预设文本数据,从而产生声音信号。此时,活体检测设备可以通过麦克风接收到该声音信号,并将该声音信号确定为待检测声音信号。Under normal circumstances, after receiving the above interactive instructions, the object to be detected will speak the preset text data according to the interactive instructions, thereby generating a sound signal. At this time, the life detection device can receive the sound signal through the microphone and determine the sound signal as the sound signal to be detected.
当然,在一示例性应用场景中,也可由上述所举例的,与活体检测设备处于同一环境中的另一对象按照该交互指令说出预设文本数据,从而产生声音信号。此时,活体检测设备可以通过麦克风接收到该声音信号,并将该声音信号确定为待检测声音信号。Of course, in an exemplary application scenario, another object in the same environment as the life detection device as exemplified above can also speak the preset text data according to the interactive instruction, thereby generating a sound signal. At this time, the life detection device can receive the sound signal through the microphone and determine the sound signal as the sound signal to be detected.
在某些实施方案中,活体检测设备可通过ASR(Automatic Speech Recognition,自动语音识别技术)对上述待检测声音信号进行语音识别,得到文本数据。 In some embodiments, the living body detection device can perform speech recognition on the above-mentioned sound signal to be detected through ASR (Automatic Speech Recognition, automatic speech recognition technology) to obtain text data.
在某些实施方案中,活体检测设备中可通过卷积神经网络算法对上述待检测声音信号进行语音识别,得到文本数据。In some embodiments, the life detection device can perform speech recognition on the above-mentioned sound signal to be detected through a convolutional neural network algorithm to obtain text data.
举例来说,在步骤1001中,当活体检测设备发出用于指示待检测对象说出265910的交互指令时,待检测对象根据该交互指令发出相应的声音信号,活体检测设备通过麦克风接收到该声音信号,得到待检测声音信号。之后,活体检测设备对待检测声音信号进行语音识别,得到文本数据265910。For example, in step 1001, when the life detection device issues an interactive instruction for instructing the object to be detected to say 265910, the object to be detected emits a corresponding sound signal according to the interaction instruction, and the life detection device receives the sound through the microphone. signal to obtain the sound signal to be detected. Afterwards, the life detection device performs speech recognition on the sound signal to be detected and obtains text data 265910.
以下对步骤1004~1005进行统一说明:The following is a unified description of steps 1004 to 1005:
在某些实施方案中,可将上述待检测声音信号对应的文本数据与预设的文件数据进行一致性比对。具体地,若比对结果表示识别得到的文本数据与预设的文本数据不一致,则说明待检测对象未按照活体检测设备输出的交互指令说出预设的文本数据。此时为避免待检测对象听觉的误差或者活体检测设备识别出现误差,活体检测设备可重新生成交互指令,并输出该交互指令。In some implementations, the text data corresponding to the sound signal to be detected can be compared with preset file data for consistency. Specifically, if the comparison result shows that the recognized text data is inconsistent with the preset text data, it means that the subject to be detected did not speak the preset text data according to the interactive instructions output by the life detection device. At this time, in order to avoid errors in the hearing of the object to be detected or errors in the recognition of the living body detection device, the living body detection device can regenerate the interactive command and output the interactive command.
相反的,若比对结果表示识别得到的文本数据与预设的文本数据一致,则说明待检测对象按照活体检测设备输出的交互指令说出预设的文本数据。此时,可继续执行步骤1006来进一步确定待检测对象是否为活体。On the contrary, if the comparison result shows that the recognized text data is consistent with the preset text data, it means that the subject to be detected speaks the preset text data according to the interactive instructions output by the living body detection device. At this time, step 1006 can be continued to further determine whether the object to be detected is a living body.
步骤1006至步骤1008的详细描述可参见上述步骤302和步骤303中的描述,这里不再赘述。For detailed descriptions of steps 1006 to 1008, please refer to the descriptions of steps 302 and 303 above, and will not be described again here.
步骤1009至步骤1011的描述可参见上述步骤906至步骤908的描述,此处不再赘述。For descriptions of steps 1009 to 1011, please refer to the description of steps 906 to 908 above, and will not be described again here.
在某些实施方案中,通过在确定待检测声音信号之前,可先输出一个交互指令,用于指示待检测对象发出与预设文本数据对应的声音信号,并在确定待检测声音信号之后,识别该待检测声音信号,将识别得到的文本数据与预设的文本数据进行一致性比对,若一致,则可进行确定待检测声音信号声源位置的步骤。在这些实施方案中,可先通过交互指令来指示待检测对象发出预设的文本数据对应的声音信号,可先初步确定待检测对象是否可以与活体检测设备进行交互,避免了待检测对象利用提前录制好的视频图像来进行仿冒,从而实现了当非认证用户通过获取认证用户提前录制好的视频图像仿冒认证用户时,能够更加快速、准确地识别待检测对象为非活体,从而提高了活体检测结果的准确性、可靠性。In some embodiments, before determining the sound signal to be detected, an interactive instruction can be output to instruct the object to be detected to emit a sound signal corresponding to the preset text data, and after determining the sound signal to be detected, identify For the sound signal to be detected, the recognized text data is compared with the preset text data for consistency. If they are consistent, the step of determining the source location of the sound signal to be detected can be performed. In these implementations, the object to be detected can be instructed to emit a sound signal corresponding to the preset text data through interactive instructions, and it can be initially determined whether the object to be detected can interact with the living body detection device, thereby avoiding the need for the object to be detected to use the device in advance. Recorded video images are used for counterfeiting, so that when non-authenticated users imitate authenticated users by obtaining video images recorded in advance by authenticated users, they can more quickly and accurately identify the object to be detected as non-living, thus improving liveness detection. The accuracy and reliability of the results.
参见图11,为本公开一实施例提供的活体检测装置110的框图。如图11所示,该装置包括:Referring to FIG. 11 , a block diagram of a living body detection device 110 is provided according to an embodiment of the present disclosure. As shown in Figure 11, the device includes:
第一确定模块111,用于确定待检测声音信号以及与所述待检测声音信号对应的待检测图像,所述待检测图像为包含待检测对象面部的图像; The first determination module 111 is used to determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the object to be detected;
第二确定模块112,用于确定所述待检测声音信号对应的声源位置,以及基于所述待检测图像确定所述待检测对象的唇部位置;The second determination module 112 is used to determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
第三确定模块113,用于将所述声源位置和所述唇部位置进行一致性比对,根据比对结果确定所述待检测对象的活体检测结果。The third determination module 113 is configured to compare the sound source position and the lip position for consistency, and determine the living body detection result of the object to be detected based on the comparison result.
在某些实施方案中,所述第三确定模块113配置为:In some embodiments, the third determination module 113 is configured as:
基于所述唇部位置确定参考空间区域;determining a reference space region based on the lip position;
确定所述声源位置是否位于所述参考空间区域内;Determine whether the sound source position is located within the reference space area;
若是,则得到所述声源位置和所述唇部位置一致的比对结果;If so, a comparison result is obtained in which the position of the sound source and the position of the lip are consistent;
若否,则得到所述声源位置和所述唇部位置不一致的比对结果。If not, a comparison result is obtained in which the sound source position and the lip position are inconsistent.
在某些实施方案中,所述第三确定模块113配置为:In some embodiments, the third determination module 113 is configured as:
若比对结果表征所述声源位置和所述唇部位置不一致,则确定所述待检测对象的活体检测结果为非活体;If the comparison result indicates that the sound source position and the lip position are inconsistent, it is determined that the living body detection result of the object to be detected is non-living body;
若比对结果表征所述声源位置和所述唇部位置一致,则确定所述待检测对象的活体检测结果为活体。If the comparison result indicates that the sound source position and the lip position are consistent, it is determined that the living body detection result of the object to be detected is a living body.
在某些实施方案中,所述装置还包括(图中未示出):In certain embodiments, the device further includes (not shown in the figure):
输入模块,用于在所述比对结果表征所述声源位置和所述唇部位置一致的情况下,将所述待检测图像和所述待检测声音信号输入至训练好的口型识别模型,得到所述口型识别模型的输出结果;An input module configured to input the image to be detected and the sound signal to be detected to a trained mouth shape recognition model when the comparison result indicates that the sound source position and the lip position are consistent. , obtain the output result of the mouth shape recognition model;
第一执行模块,用于若所述输出结果表示所述待检测声音信号和所述待检测图像中所述待检测对象的口型相匹配,则执行所述确定所述待检测对象的活体检测结果为活体的步骤。A first execution module, configured to perform the life detection of determining the object to be detected if the output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected. The result is a step that is in vivo.
在某些实施方案中,所述装置还包括(图中未示出):In certain embodiments, the device further includes (not shown in the figure):
输出模块,用于在所述确定待检测声音信号之前,输出交互指令,所述交互指令用于指示所述待检测对象发出与预设文本数据对应的声音信号;An output module, configured to output an interactive instruction before determining the sound signal to be detected, the interactive instruction being used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
识别模块,用于在所述确定所述待检测声音信号对应的声源位置之前对所述待检测声音信号进行语音识别,得到所述待检测声音信号对应的文本数据;A recognition module, configured to perform speech recognition on the sound signal to be detected before determining the sound source position corresponding to the sound signal to be detected, and to obtain text data corresponding to the sound signal to be detected;
比对模块,用于将所述待检测声音信号对应的文本数据与所述预设文本数据进行一致性比对;A comparison module for comparing the text data corresponding to the sound signal to be detected with the preset text data for consistency;
第二执行模块,用于若比对结果表示所述待检测声音信号对应的文本数据与所述预设文本数据一致,则执行所述确定所述声音信号对应的声源位置的步骤。The second execution module is configured to execute the step of determining the sound source position corresponding to the sound signal if the comparison result indicates that the text data corresponding to the sound signal to be detected is consistent with the preset text data.
在某些实施方案中,所述装置还包括(图中未示出):In certain embodiments, the device further includes (not shown in the figure):
第三执行模块,用于若比对结果表示所述待检测声音信号对应的文本数据与所述预设文本数据不一致,则返回执行所述输出 交互指令的步骤。The third execution module is used to return to execute the output if the comparison result indicates that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data. Steps for interactive instructions.
在某些实施方案中,所述输出模块配置为:In some embodiments, the output module is configured to:
调用预设的随机数生成算法,生成随机数组;Call the preset random number generation algorithm to generate a random array;
基于所述随机数组生成所述预设文本数据,并根据所述预设文本数据生成所述交互指令。The preset text data is generated based on the random array, and the interactive instruction is generated based on the preset text data.
在某些实施方案中,所述第一确定模块111配置为:In some embodiments, the first determining module 111 is configured as:
获取多个麦克风采集到的声音信号;Obtain sound signals collected by multiple microphones;
将每个所述麦克风采集到的所述声音信号进行合成处理,得到待检测声音信号。The sound signals collected by each of the microphones are synthesized and processed to obtain the sound signal to be detected.
在某些实施方案中,所述第二确定模块112配置为:In some embodiments, the second determination module 112 is configured to:
对所述待检测声音信号进行分解,得到多个分解信号;Decompose the sound signal to be detected to obtain multiple decomposed signals;
确定每个所述分解信号的声源方向;Determine the sound source direction of each of the decomposed signals;
将多个所述声源方向的交点位置确定为所述待检测声音信号的声源位置。The intersection position of multiple sound source directions is determined as the sound source position of the sound signal to be detected.
图12为本公开一实施例提供的电子设备的结构示意图,图12所示的电子设备1200包括:至少一个处理器1201、存储器1202、至少一个网络接口1204和用户接口1203。电子设备1200中的各个组件通过总线系统1205耦合在一起。可理解,总线系统1205用于实现这些组件之间的连接通信。总线系统1205除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图12中将各种总线都标为总线系统1205。FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. The electronic device 1200 shown in FIG. 12 includes: at least one processor 1201, a memory 1202, at least one network interface 1204, and a user interface 1203. The various components in electronic device 1200 are coupled together through bus system 1205 . It can be understood that the bus system 1205 is used to implement connection communication between these components. In addition to the data bus, the bus system 1205 also includes a power bus, a control bus and a status signal bus. However, for the sake of clarity, the various buses are labeled bus system 1205 in FIG. 12 .
其中,用户接口1203可以包括显示器、键盘或者点击设备(例如,鼠标,轨迹球(trackball)、触感板或者触摸屏等)。The user interface 1203 may include a display, a keyboard, or a clicking device (eg, a mouse, a trackball, a touch pad, a touch screen, etc.).
可以理解,本公开实施例中的存储器1202可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本文描述的存储器1202旨在包 括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 1202 in the embodiment of the present disclosure may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically removable memory. Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (Dynamic RAM, DRAM), synchronous dynamic random access memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synchlink DRAM, SLDRAM) and Direct Rambus RAM (DRRAM). The memory 1202 described herein is intended to include Including, but not limited to, these and any other suitable types of memory.
在某些实施方案中,存储器1202存储了如下的元素,可执行单元或者数据结构,或者他们的子集,或者他们的扩展集:操作系统12021和应用程序12022。In some embodiments, memory 1202 stores the following elements, executable units or data structures, or a subset thereof, or an extension thereof: operating system 12021 and applications 12022.
其中,操作系统12021,包含各种系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务。应用程序12022,包含各种应用程序,例如媒体播放器(Media Player)、浏览器(Browser)等,用于实现各种应用业务。实现本公开一实施例方法的程序可以包含在应用程序12022中。Among them, the operating system 12021 includes various system programs, such as framework layer, core library layer, driver layer, etc., which are used to implement various basic services and process hardware-based tasks. Application 12022 includes various applications, such as media player, browser, etc., and is used to implement various application services. The program that implements the method of an embodiment of the present disclosure may be included in the application program 12022.
在本公开一实施例中,通过调用存储器1202存储的程序或指令,在某些实施方案中,可以是应用程序12022中存储的程序或指令,处理器1201用于执行各方法实施例所提供的方法步骤,例如包括:In an embodiment of the present disclosure, by calling the program or instructions stored in the memory 1202, which in some embodiments may be the program or instructions stored in the application program 12022, the processor 1201 is used to execute the methods provided by each method embodiment. Method steps include, for example:
确定待检测声音信号以及与所述待检测声音信号对应的待检测图像,所述待检测图像为包含待检测对象面部的图像;Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the object to be detected;
确定所述待检测声音信号对应的声源位置,以及基于所述待检测图像确定所述待检测对象的唇部位置;Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
将所述声源位置和所述唇部位置进行一致性比对,根据比对结果确定所述待检测对象的活体检测结果。The sound source position and the lip position are compared for consistency, and the living body detection result of the object to be detected is determined based on the comparison result.
上述本公开实施例揭示的方法可以应用于处理器1201中,或者由处理器1201实现。处理器1201可能是集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1201中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1201可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本公开实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本公开实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件单元组合执行完成。软件单元可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1202,处理器1201读取存储器1202中的信息,结合其硬件完成上述方法的步骤。The methods disclosed in the above embodiments of the present disclosure can be applied to the processor 1201 or implemented by the processor 1201. The processor 1201 may be an integrated circuit chip and has signal processing capabilities. During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor 1201 . The above-mentioned processor 1201 can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Each disclosed method, step and logical block diagram in the embodiment of the present disclosure can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present disclosure can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software units in the decoding processor. The software unit can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory 1202. The processor 1201 reads the information in the memory 1202 and completes the steps of the above method in combination with its hardware.
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以 实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。It is understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For hardware implementation, the processing unit can Implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), Programmable Logic Device, PLD), field-programmable gate array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other electronic units used to perform the functions described in this application, or combinations thereof.
对于软件实现,可通过执行本文所述功能的单元来实现本文所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。For software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. Software code may be stored in memory and executed by a processor. The memory can be implemented in the processor or external to the processor.
本公开提供的电子设备可以是如图12中所示的电子设备,可执行如图3~4和图9~10中活体检测方法的所有步骤,进而实现图3~4和图9~10所示活体检测方法的技术效果,具体请参照图3~4和图9~10相关描述,为简洁描述,在此不作赘述。The electronic device provided by the present disclosure can be the electronic device as shown in Figure 12, and can perform all the steps of the living body detection method in Figures 3-4 and Figures 9-10, thereby realizing the steps shown in Figures 3-4 and Figures 9-10. shows the technical effect of the living body detection method. For details, please refer to the relevant descriptions in Figures 3 to 4 and Figures 9 to 10. This is a concise description and will not be repeated here.
本公开实施例还提供了存储介质(计算机可读存储介质)。这里的存储介质存储有一个或者多个程序。其中,存储介质可以包括易失性存储器,例如随机存取存储器;存储器也可以包括非易失性存储器,例如只读存储器、快闪存储器、硬盘或固态硬盘;存储器还可以包括上述种类的存储器的组合。Embodiments of the present disclosure also provide storage media (computer-readable storage media). The storage medium here stores one or more programs. The storage medium may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid state hard drive; the memory may also include the above types of memory. combination.
存储介质中一个或者多个程序可被一个或者多个处理器执行,以实现上述在电子设备侧执行的活体检测方法。One or more programs in the storage medium can be executed by one or more processors to implement the above-mentioned life detection method executed on the electronic device side.
所述处理器用于执行存储器中存储的活体检测程序,以实现以下在电子设备侧执行的活体检测方法的步骤:The processor is used to execute the life detection program stored in the memory to implement the following steps of the life detection method executed on the electronic device side:
确定待检测声音信号以及与所述待检测声音信号对应的待检测图像,所述待检测图像为包含待检测对象面部的图像;Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the object to be detected;
确定所述待检测声音信号对应的声源位置,以及基于所述待检测图像确定所述待检测对象的唇部位置;Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
将所述声源位置和所述唇部位置进行一致性比对,根据比对结果确定所述待检测对象的活体检测结果。The sound source position and the lip position are compared for consistency, and the living body detection result of the object to be detected is determined based on the comparison result.
专业人员应该还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。Those skilled in the art should further realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, computer software, or a combination of both. In order to clearly illustrate the relationship between hardware and software Interchangeability, in the above description, the composition and steps of each example have been generally described according to functions. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of this disclosure.
结合本文中所公开的实施例描述的方法或算法的步骤可以用 硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。Steps of methods or algorithms described in connection with the embodiments disclosed herein may be used Implemented in hardware, software modules executed by a processor, or a combination of the two. Software modules may be located in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disks, removable disks, CD-ROMs, or anywhere in the field of technology. any other known form of storage media.
以上所述的具体实施方式,对本公开的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本公开的具体实施方式而已,并不用于限定本公开的保护范围,凡在本公开的实质和原则之内所做的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。 The above-mentioned specific embodiments further describe the purpose, technical solutions and beneficial effects of the present disclosure in detail. It should be understood that the above-mentioned are only specific embodiments of the present disclosure and are not intended to limit the scope of the present disclosure. Any modifications, equivalent substitutions, improvements, etc. made within the essence and principles of this disclosure shall be included in the scope of protection of this disclosure.

Claims (12)

  1. 活体检测方法,所述方法包括:In vivo detection method, the method includes:
    确定待检测声音信号以及与所述待检测声音信号对应的待检测图像,所述待检测图像为包含待检测对象面部的图像;Determine the sound signal to be detected and the image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the object to be detected;
    确定所述待检测声音信号对应的声源位置,以及基于所述待检测图像确定所述待检测对象的唇部位置;Determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
    将所述声源位置和所述唇部位置进行一致性比对,根据比对结果确定所述待检测对象的活体检测结果。The sound source position and the lip position are compared for consistency, and the living body detection result of the object to be detected is determined based on the comparison result.
  2. 根据权利要求1所述的方法,其中,所述将所述声源位置和所述唇部位置进行一致性比对,包括:The method according to claim 1, wherein the consistency comparison between the sound source position and the lip position includes:
    基于所述唇部位置确定参考空间区域;determining a reference space region based on the lip position;
    确定所述声源位置是否位于所述参考空间区域内,Determine whether the sound source position is located within the reference space area,
    当所述声源位置位于所述参考空间区域内时,得到所述声源位置和所述唇部位置一致的比对结果,When the sound source position is located in the reference space area, a comparison result is obtained that the sound source position and the lip position are consistent,
    当所述声源位置不位于所述参考空间区域内时,得到所述声源位置和所述唇部位置不一致的比对结果。When the sound source position is not located in the reference space area, a comparison result in which the sound source position and the lip position are inconsistent is obtained.
  3. 根据权利要求1或2所述的方法,其中,所述根据比对结果确定所述待检测对象的活体检测结果,包括:The method according to claim 1 or 2, wherein determining the vitality detection result of the object to be detected according to the comparison result includes:
    当比对结果表征所述声源位置和所述唇部位置不一致时,确定所述待检测对象的活体检测结果为非活体;When the comparison result indicates that the sound source position and the lip position are inconsistent, it is determined that the living body detection result of the object to be detected is non-living body;
    当比对结果表征所述声源位置和所述唇部位置一致时,确定所述待检测对象的活体检测结果为活体。When the comparison result indicates that the sound source position and the lip position are consistent, it is determined that the living body detection result of the object to be detected is a living body.
  4. 根据权利要求3所述的方法,其中,当所述比对结果表征所述声源位置和所述唇部位置一致时,所述方法还包括:The method according to claim 3, wherein when the comparison result indicates that the sound source position and the lip position are consistent, the method further includes:
    将所述待检测图像和所述待检测声音信号输入至训练好的口型识别模型,得到所述口型识别模型的输出结果;Input the image to be detected and the sound signal to be detected to the trained mouth shape recognition model to obtain the output result of the mouth shape recognition model;
    当所述输出结果表示所述待检测声音信号和所述待检测图像中所述待检测对象的口型相匹配时,执行所述确定所述待检测对象的活体检测结果为活体的步骤。When the output result indicates that the sound signal to be detected matches the mouth shape of the object to be detected in the image to be detected, the step of determining that the living body detection result of the object to be detected is a living body is performed.
  5. 根据权利要求1至4中任一权利要求所述的方法,在所述确定待检测声音信号之前,所述方法还包括:The method according to any one of claims 1 to 4, before determining the sound signal to be detected, the method further includes:
    输出交互指令,所述交互指令用于指示所述待检测对象发出与预设文本数据对应的声音信号;Output interactive instructions, the interactive instructions are used to instruct the object to be detected to emit a sound signal corresponding to the preset text data;
    在所述确定所述待检测声音信号对应的声源位置之前,还包括:Before determining the sound source position corresponding to the sound signal to be detected, the method further includes:
    对所述待检测声音信号进行语音识别,得到所述待检测声音信号对应的文本数据;Perform speech recognition on the sound signal to be detected to obtain text data corresponding to the sound signal to be detected;
    将所述待检测声音信号对应的文本数据与所述预设文本数据 进行一致性比对;Combining the text data corresponding to the sound signal to be detected and the preset text data Perform consistency comparison;
    当比对结果表示所述待检测声音信号对应的文本数据与所述预设文本数据一致时,执行所述确定所述声音信号对应的声源位置的步骤。When the comparison result indicates that the text data corresponding to the sound signal to be detected is consistent with the preset text data, the step of determining the sound source position corresponding to the sound signal is performed.
  6. 根据权利要求5所述的方法,所述方法还包括:The method of claim 5, further comprising:
    当比对结果表示所述待检测声音信号对应的文本数据与所述预设文本数据不一致时,返回执行所述输出交互指令的步骤。When the comparison result indicates that the text data corresponding to the sound signal to be detected is inconsistent with the preset text data, return to the step of outputting the interactive instruction.
  7. 根据权利要求5或6所述的方法,其中,所述交互指令的生成过程包括:The method according to claim 5 or 6, wherein the generating process of the interactive instructions includes:
    调用预设的随机数生成算法,生成随机数组;Call the preset random number generation algorithm to generate a random array;
    基于所述随机数组生成所述预设文本数据,并根据所述预设文本数据生成所述交互指令。The preset text data is generated based on the random array, and the interactive instruction is generated based on the preset text data.
  8. 根据权利要求1至7中任一权利要求所述的方法,其中,所述确定待检测声音信号,包括:The method according to any one of claims 1 to 7, wherein determining the sound signal to be detected includes:
    获取多个麦克风采集到的声音信号;Obtain sound signals collected by multiple microphones;
    将每个所述麦克风采集到的所述声音信号进行合成处理,得到待检测声音信号。The sound signals collected by each of the microphones are synthesized and processed to obtain the sound signal to be detected.
  9. 根据权利要求1至8中任一权利要求所述的方法,其中,所述确定所述待检测声音信号对应的声源位置,包括:The method according to any one of claims 1 to 8, wherein determining the sound source position corresponding to the sound signal to be detected includes:
    对所述待检测声音信号进行分解,得到多个分解信号;Decompose the sound signal to be detected to obtain multiple decomposed signals;
    确定每个所述分解信号的声源方向;Determine the sound source direction of each of the decomposed signals;
    将多个所述声源方向的交点位置确定为所述待检测声音信号的声源位置。The intersection position of multiple sound source directions is determined as the sound source position of the sound signal to be detected.
  10. 活体检测装置,所述装置包括:Living body detection device, the device includes:
    第一确定模块,配置为确定待检测声音信号以及与所述待检测声音信号对应的待检测图像,所述待检测图像为包含待检测对象面部的图像;A first determination module configured to determine a sound signal to be detected and an image to be detected corresponding to the sound signal to be detected, where the image to be detected is an image containing the face of the subject to be detected;
    第二确定模块,配置为确定所述待检测声音信号对应的声源位置,以及基于所述待检测图像确定所述待检测对象的唇部位置;a second determination module configured to determine the sound source position corresponding to the sound signal to be detected, and determine the lip position of the object to be detected based on the image to be detected;
    第三确定模块,配置为将所述声源位置和所述唇部位置进行一致性比对,根据比对结果确定所述待检测对象的活体检测结果。The third determination module is configured to conduct a consistency comparison between the sound source position and the lip position, and determine the living body detection result of the object to be detected based on the comparison result.
  11. 电子设备,包括:处理器和存储器,所述处理器配置为执行所述存储器中存储的活体检测程序,以实现权利要求1至9中任一权利要求所述的活体检测方法。An electronic device includes: a processor and a memory, and the processor is configured to execute a life detection program stored in the memory to implement the life detection method according to any one of claims 1 to 9.
  12. 存储介质,存储有一个或者多个程序,所述一个或者多个程序能够被一个或者多个处理器执行,以实现权利要求1至9中任一权利要求所述的活体检测方法。 A storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the living body detection method described in any one of claims 1 to 9.
PCT/CN2023/109776 2022-09-05 2023-07-28 Living body detection method and apparatus, electronic device, and storage medium WO2024051380A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211077263.3 2022-09-05
CN202211077263.3A CN115171227B (en) 2022-09-05 2022-09-05 Living body detection method, living body detection device, electronic apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2024051380A1 true WO2024051380A1 (en) 2024-03-14

Family

ID=83481566

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/109776 WO2024051380A1 (en) 2022-09-05 2023-07-28 Living body detection method and apparatus, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN115171227B (en)
WO (1) WO2024051380A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171227B (en) * 2022-09-05 2022-12-27 深圳市北科瑞声科技股份有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106911630A (en) * 2015-12-22 2017-06-30 上海仪电数字技术股份有限公司 Terminal and the authentication method and system of identity identifying method, terminal and authentication center
CN107767137A (en) * 2016-08-23 2018-03-06 中国移动通信有限公司研究院 A kind of information processing method, device and terminal
CN110210196A (en) * 2019-05-08 2019-09-06 北京地平线机器人技术研发有限公司 Identity identifying method and device
CN112560554A (en) * 2019-09-25 2021-03-26 北京中关村科金技术有限公司 Lip language-based living body detection method, device and storage medium
CN115171227A (en) * 2022-09-05 2022-10-11 深圳市北科瑞声科技股份有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4815661B2 (en) * 2000-08-24 2011-11-16 ソニー株式会社 Signal processing apparatus and signal processing method
KR20130101943A (en) * 2012-03-06 2013-09-16 삼성전자주식회사 Endpoints detection apparatus for sound source and method thereof
JP6030032B2 (en) * 2013-08-30 2016-11-24 本田技研工業株式会社 Sound processing apparatus, sound processing method, and sound processing program
US9996732B2 (en) * 2015-07-20 2018-06-12 International Business Machines Corporation Liveness detector for face verification
CN106709402A (en) * 2015-11-16 2017-05-24 优化科技(苏州)有限公司 Living person identity authentication method based on voice pattern and image features
CN107422305B (en) * 2017-06-06 2020-03-13 歌尔股份有限公司 Microphone array sound source positioning method and device
CN113138367B (en) * 2020-01-20 2024-07-26 中国科学院上海微系统与信息技术研究所 Target positioning method and device, electronic equipment and storage medium
CN113743160A (en) * 2020-05-29 2021-12-03 北京中关村科金技术有限公司 Method, apparatus and storage medium for biopsy
CN114252844A (en) * 2021-12-24 2022-03-29 中北大学 Passive positioning method for single sound source target

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106911630A (en) * 2015-12-22 2017-06-30 上海仪电数字技术股份有限公司 Terminal and the authentication method and system of identity identifying method, terminal and authentication center
CN107767137A (en) * 2016-08-23 2018-03-06 中国移动通信有限公司研究院 A kind of information processing method, device and terminal
CN110210196A (en) * 2019-05-08 2019-09-06 北京地平线机器人技术研发有限公司 Identity identifying method and device
CN112560554A (en) * 2019-09-25 2021-03-26 北京中关村科金技术有限公司 Lip language-based living body detection method, device and storage medium
CN115171227A (en) * 2022-09-05 2022-10-11 深圳市北科瑞声科技股份有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium

Also Published As

Publication number Publication date
CN115171227B (en) 2022-12-27
CN115171227A (en) 2022-10-11

Similar Documents

Publication Publication Date Title
US11093773B2 (en) Liveness detection method, apparatus and computer-readable storage medium
US20150294433A1 (en) Generating a screenshot
US10241990B2 (en) Gesture based annotations
WO2024051380A1 (en) Living body detection method and apparatus, electronic device, and storage medium
CN107831902B (en) Motion control method and device, storage medium and terminal
CN110785996B (en) Dynamic control of camera resources in a device with multiple displays
WO2020119032A1 (en) Biometric feature-based sound source tracking method, apparatus, device, and storage medium
WO2020051971A1 (en) Identity recognition method, apparatus, electronic device, and computer-readable storage medium
US10278001B2 (en) Multiple listener cloud render with enhanced instant replay
WO2021169616A1 (en) Method and apparatus for detecting face of non-living body, and computer device and storage medium
WO2022100690A1 (en) Animal face style image generation method and apparatus, model training method and apparatus, and device
JP2020520576A5 (en)
WO2021120190A1 (en) Data processing method and apparatus, electronic device, and storage medium
CN110427849B (en) Face pose determination method and device, storage medium and electronic equipment
WO2019052053A1 (en) Whiteboard information reading method and device, readable storage medium and electronic whiteboard
WO2023173686A1 (en) Detection method and apparatus, electronic device, and storage medium
WO2021103609A1 (en) Method and apparatus for driving interaction object, electronic device and storage medium
US10079028B2 (en) Sound enhancement through reverberation matching
JP4934158B2 (en) Video / audio processing apparatus, video / audio processing method, video / audio processing program
Dingli et al. Turning homes into low-cost ambient assisted living environments
CN109903054B (en) Operation confirmation method and device, electronic equipment and storage medium
CN115454287A (en) Virtual digital human interaction method, device, equipment and readable storage medium
US11120524B2 (en) Video conferencing system and video conferencing method
US10812898B2 (en) Sound collection apparatus, method of controlling sound collection apparatus, and non-transitory computer-readable storage medium
JP2023036273A (en) Information processing apparatus and information processing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23862081

Country of ref document: EP

Kind code of ref document: A1