WO2021218695A1 - Monocular camera-based liveness detection method, device, and readable storage medium - Google Patents

Monocular camera-based liveness detection method, device, and readable storage medium Download PDF

Info

Publication number
WO2021218695A1
WO2021218695A1 PCT/CN2021/088272 CN2021088272W WO2021218695A1 WO 2021218695 A1 WO2021218695 A1 WO 2021218695A1 CN 2021088272 W CN2021088272 W CN 2021088272W WO 2021218695 A1 WO2021218695 A1 WO 2021218695A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
living body
body detection
feature
neural network
Prior art date
Application number
PCT/CN2021/088272
Other languages
French (fr)
Chinese (zh)
Inventor
郭宏伟
李辉
马杰延
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021218695A1 publication Critical patent/WO2021218695A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of data processing, and in particular to a living body detection method, device and readable storage medium based on a single-camera RGB image.
  • the embodiment of the present application provides a living body detection method based on a monocular camera, which can ensure the accuracy of living body detection without adding additional costs.
  • a living body detection method is provided, the method is applied to an electronic device, the method may include: acquiring a first image, the first image is an RGB image, and the first image includes a face image of a target object; Obtain a first depth image according to the first image and the depth image generation network; determine the living body detection result according to the first image, the first depth image, and the detection network; the living body detection result is used to indicate whether the target object is a living body; according to The result of the living body test executes the action.
  • the technical solution provided by the above-mentioned first aspect generates a depth image from an RGB image, and then performs a living body detection based on the RGB image and the depth image.
  • This method can effectively prevent the aggressive behavior in the living body detection, while improving the accuracy of the living body detection, without adding additional equipment to obtain the depth image, effectively reducing the cost.
  • the depth image generation network includes a first neural network and a second neural network; the obtaining the first depth image according to the first image and the depth image generation network specifically includes: through the first neural network Extracting the coarse granularity feature of the first image; extracting the fine granularity feature of the first image through the second neural network; generating the first depth image according to the coarse granularity feature and the fine granularity feature.
  • the generating the first depth image according to the coarse-grained feature and the fine-grained feature includes: acquiring a fusion feature, and the fused feature is to fuse the coarse-grained feature and the fine-grained feature through a fusion algorithm. Granularity feature is obtained; the first depth image is generated according to the fusion feature.
  • the first neural network and the second neural network are lightweight convolutional neural networks.
  • the detection network includes a third neural network and a fourth neural network, and the determining the living body detection result according to the first image, the first depth image, and the detection network specifically includes: extracting the detection result through the third neural network.
  • the feature of the first image; the feature of the first depth image is extracted through the fourth neural network; the feature map is obtained according to the feature of the first image and the feature of the first depth image; the living body detection result is determined according to the feature map.
  • the determining the living body detection result according to the characteristic map specifically includes: performing global pooling on the characteristic map to obtain the global characteristic; and determining the living body detection result according to the global characteristic.
  • the method further includes: acquiring a second image; acquiring the face image in the second image according to a face detection algorithm; and determining the first image according to the face image.
  • the first image is a face image that has been aligned and preprocessed.
  • performing an action according to the result of the living body detection comprises: when the result of the living body detection indicates that the target object is a living body, performing portrait tracking on the target object.
  • performing an action based on the result of the living body detection includes: when the result of the living body detection indicates that the target object is a living body, determining whether the target object is a child; when the target object is a child, switching to a child mode .
  • the depth image generation network and the detection network belong to a living body detection neural network
  • the living body detection neural network is obtained by joint training based on the local feature and the global feature.
  • an electronic device which has the methods and functions described in any one of the possible implementations of the first aspect described above.
  • the function can be implemented by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • an electronic device including one or more processors; a memory; a camera; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more
  • the computer program includes instructions, and when the instructions are executed by the electronic device, the electronic device executes any one of the possible implementation manners of the first aspect described above.
  • a computer-readable storage medium including computer instructions, which when executed on an electronic device, cause the electronic device to execute any of the possible implementation manners of the first aspect described above.
  • a chip is provided, which is coupled with a memory in an electronic device, so that the chip invokes program instructions stored in the memory when it is running, so that the electronic device executes any of the possible methods as in the first aspect above. Method to realize.
  • FIG. 1a is an application scenario of a living body detection method provided by an embodiment of this application.
  • FIG. 1b is an application scenario of another living body detection method provided by an embodiment of the application.
  • FIG. 1c is an application scenario of another living body detection method provided by an embodiment of the application.
  • FIG. 1d is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • FIG. 1e is a schematic diagram of a convolutional neural network provided by an embodiment of this application.
  • FIG. 2 is a schematic flowchart of a method for training a living body detection model provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of generating a depth image based on coarse-to-fine according to an embodiment of the application
  • FIG. 4 is a schematic diagram of a local feature training provided by an embodiment of this application.
  • FIG. 5 is a flowchart of applying a living body detection model provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of another electronic device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of another electronic device provided by an embodiment of the application.
  • words such as “exemplary” or “for example” are used as examples, illustrations, or illustrations. Any embodiment or design solution described as “exemplary” or “for example” in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as “exemplary” or “for example” are used to present related concepts in a specific manner.
  • Each pixel of an RGB image has 3 values to represent colors, that is, a variety of colors can be obtained through the changes of the three colors of red, green, and blue and the superposition of each other.
  • Depth image also called distance image, refers to an image in which the distance from an image collector, such as a camera, to a point in the scene is used as the pixel value of each pixel.
  • the depth image directly reflects the geometry of the visible surface of the subject.
  • Monocular camera generally refers to a camera.
  • the monocular camera can only take one type of image at the same time.
  • Binocular cameras generally refer to two cameras, which can acquire two different types of images at the same time.
  • a binocular camera can simultaneously acquire RGB images and depth images.
  • the binocular camera may be a 3D camera, including a color camera and a depth sensor.
  • Deep neural network is a framework of deep learning that can provide modeling for complex nonlinear systems. In other words, deep neural networks can systematically classify data.
  • Convolutional neural network (convolutional neural network, CNN) is composed of one or more convolutional layers and a fully connected layer at the top. It also includes associated weights and pooling layers. Convolutional neural network is a bottom-up network structure, which uses a multi-layer network and abstracts layer by layer. Each layer abstracts higher-level feature representations that deal with various invariances on the basis of the upper layer.
  • a convolutional neural network may include a convolutional layer, a pooling layer, and a fully connected layer. In some cases, the convolutional neural network can also be connected to a loss layer.
  • the convolutional layer is a set of parallel feature maps, which are composed by sliding different convolution kernels on the input image and running certain operations. In addition, at each sliding position, an element-corresponding product and summation operation is run between the convolution kernel and the input image to project information to an element in the feature map. For example, for an RGB image, the convolutional layer can convert the image into a feature map.
  • the pooling layer is a non-linear form of downsampling, which is used to pool the feature map. Pooling can have a variety of different forms of non-linear pooling functions, such as max pooling and average pooling. Maximum pooling is to divide the input image into several rectangular areas, and output the maximum value for each sub-area. For example, the pooling layer can pool the feature map to reduce the number of features in the feature map.
  • the fully connected layer is used for advanced reasoning in neural networks. For example, "2" for a handwriting with a size of 32*32. The human eye can immediately recognize that the handwritten "2" is the number 2. But for electronic devices, all pixels in this picture need to be input to the neural network for processing to be recognized. However, if all pixels are directly input to the fully connected layer for processing, the amount of data will be extremely large. For example, for the above 32*32 size image, 1.6 billion parameters may be obtained. In this case, the image can be preprocessed first, and then finally input to the fully connected layer for recognition. For example, the fully connected layer can map the features processed by the convolutional layer and the pooling layer into a one-dimensional feature vector.
  • the fully connected layer can also be connected to the loss function layer.
  • the loss function layer can be used to determine the difference between the predicted result and the real result during the neural network training process.
  • Various loss functions are suitable for different types of tasks.
  • the Softmax function can map the output of multiple neurons in the neural network to the (0,1) interval for classification and calculation.
  • the process of training a neural network is a process of continuously reducing the loss by adjusting the parameters in the neural network.
  • liveness detection refers to a user's behavior to determine whether a static picture is a real picture or a photo, and does not need to be recognized by the user's actions such as shaking his head or blinking an eye.
  • Figure 1a shows a scene of living body detection.
  • the electronic device 102 when the electronic device 102 wants to verify the user identity, it needs to determine whether the current operation is the real user 104 or the face photo 103 of the user 104, so as to prevent others from holding the photo 103 of the user 104 to obtain the authority of the user 104, thereby Harm the interests of the user 104. For example, when another person obtains the electronic device 102 of the user 104 and the photo of the user 104, it is impossible to unlock the screen of the electronic device 102 or complete functions such as payment by using the photo of the user 104.
  • the existing living body detection is mainly divided into two schemes: single mode and multi-mode.
  • single mode refers to the use of images acquired by the same imaging device for living body detection.
  • the single-modal solution uses RGB image input into a neural network to extract features for classification, and then compares the features with previously saved user facial features, and finally determines whether it is a living body. Its main characteristics are simplicity, faster speed, and lower training and deployment costs.
  • Multi-modality refers to the use of images acquired by different imaging devices for face matching.
  • a multi-modal solution fuses RGB images and corresponding multi-modal data, such as infrared images, depth images, etc., and uses neural networks to extract depth features for live detection. The depth image can be obtained through a binocular camera or other specific equipment.
  • the advantage of the multi-modal scheme is that it has high accuracy and is not easy to be attacked.
  • both of these two methods of living body detection have certain shortcomings. For example, due to the lack of other types of data, a single-modal living body detection scheme can easily recognize photos, masks, etc. as the user himself, and the recognition accuracy is not high. In other single-modal living detection solutions, in order to improve accuracy, in addition to using static RGB images, it is also possible to determine whether a living body is a living body through user actions, such as blinking and turning the head. However, this method requires the user to make a specified action, and the user experience is not good. Although the multi-modal living body detection scheme has high detection accuracy, the multi-modal data is not easy to obtain, and multiple types of cameras are required, resulting in high cost. At the same time, multi-modal data and training through neural networks are more complicated.
  • this application provides a living body detection method based on a single-camera RGB image. Specifically, after the electronic device obtains the RGB image, it can determine whether the portrait in the RGB image is a living body through the living body detection neural network. Among them, the living body detection neural network can generate a depth image based on the RGB image, and then perform feature fusion between the RGB image and the depth image, and determine whether the portrait in the RGB image is alive according to the fused features.
  • the embodiments of the present application provide a living body detection method, which does not require additional equipment to obtain a multi-modal image, and at the same time, can ensure the accuracy of living body detection.
  • Fig. 1b shows an application scenario of an embodiment of the present application, which is mainly applied to portrait tracking of video calls.
  • the electronic device 101 has a camera 105.
  • the camera 105 can capture RGB images.
  • the camera 105 can determine whether the user is actually in front of the camera through the single-camera RGB image living detection technology provided in this embodiment of the application, and adjust the captured image to show the user The image of is set in the center of the screen.
  • Figure 1c shows another application scenario of an embodiment of the present application, which is mainly applied in a large-screen child mode.
  • the electronic device 101 has a camera 105.
  • the electronic device 101 has a child mode, and the camera 105 can determine whether the person currently watching the screen is a child, so as to switch to the child mode and display the program watched by the child.
  • an image 106 of a child may be hung on the wall opposite to the electronic device 101.
  • the electronic device 101 may misrecognize the image 106 as the child himself and switch to the child mode, which seriously affects the user's experience.
  • the living body detection technology provided by the embodiments of the present application can effectively detect that the image 106 is not a real person through a camera, thereby avoiding false triggering of the child mode.
  • Fig. 1b and Fig. 1c are only exemplary, and the embodiments of the present application provide a living body detection method using a single camera. Any scene that can be used for living body detection, such as payment , Access control, etc., all of which can be applied to the technical solutions provided in this application.
  • the electronic device in the embodiments of the present application may be a portable electronic device containing other functions such as a personal digital assistant and/or a music player function, such as a mobile phone, a tablet computer, a wearable electronic device with wireless communication function (such as a smart watch), etc. .
  • portable electronic devices include, but are not limited to, carrying Or portable electronic devices with other operating systems, smart screens with cameras, TVs with cameras, etc.
  • the aforementioned portable electronic device may also be other portable electronic devices, such as a laptop computer with a touch-sensitive surface (such as a touch panel). It should also be understood that in some other embodiments of the present application, the above-mentioned electronic device may not be a portable electronic device, but a desktop computer with a touch-sensitive surface (such as a touch panel).
  • Figure 1d exemplarily shows a schematic structural diagram of an electronic device.
  • the electronic device is a mobile phone for illustration.
  • the illustrated electronic device is only an example, and the electronic device may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations .
  • the various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the mobile phone may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, and an antenna.
  • a processor 110 an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, and an antenna.
  • Antenna 2 mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display Screen 194, and subscriber identification module (SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller can be the nerve center and command center of the mobile phone. The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory, thereby avoiding repeated access, reducing the waiting time of the processor 110, and improving the efficiency of the system.
  • the processor 110 may be configured to execute the solution for authenticating user information in the embodiment of the present application.
  • the processor 110 can also execute the processing schemes executed by the server mentioned in the following content, such as determining the authentication security value corresponding to the operating device, for example, calculating the total authentication based on the M authentication security values Safety value and so on.
  • the processor 110 integrates different devices, such as integrated CPU and GPU, the CPU and GPU can cooperate to execute the operation prompt method provided in the embodiment of the present application. For example, in the operation prompt method, part of the algorithm is executed by the CPU, and the other part of the algorithm is executed by the GPU. Execute to get faster processing efficiency.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver ( universal asynchronous receiver/transmitter, UART interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface , And/or Universal Serial Bus (USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART mobile industry processor interface
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the wireless communication function of the mobile phone can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the wireless communication function of the mobile phone can realize the communication between the electronic device and the electronic device, as well as the electronic device and the server in the embodiments of the present application.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the mobile phone can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G, etc., which are applied to mobile phones.
  • the mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the wireless communication module 160 can provide applications on mobile phones including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), and global navigation satellite systems ( Global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite systems
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves through the antenna 2 and radiate it out.
  • the antenna 1 of the mobile phone is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the mobile phone can communicate with the network and other devices through wireless communication technology.
  • Wireless communication technologies can include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), and broadband code division. Multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM , And/or IR technology, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • CDMA code division multiple access
  • CDMA broadband code division. Multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM , And/or IR technology, etc.
  • GNSS can include global positioning system (GPS),
  • the mobile phone realizes the display function through GPU, display screen 194, and application processor.
  • the GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, and the like.
  • the display screen 194 includes a display panel.
  • the display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active matrix organic light-emitting diode active-matrix organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the mobile phone can realize the shooting function through ISP, camera 193, video codec, GPU, display 194 and application processor.
  • the camera 193 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the mobile phone may include one or more cameras 193.
  • the camera 193 may be used to adopt the facial information of the user.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the mobile phone.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store computer executable program code, and the executable program code includes instructions.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area can store the operating system, at least one application program (such as sound playback function, image playback function, etc.) required by at least one function.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the mobile phone.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • the processor 110 executes various functional applications and data processing of the mobile phone by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C.
  • the mobile phone can be equipped with at least one microphone 170C.
  • the mobile phone may be equipped with two microphones 170C, which can realize noise reduction function in addition to collecting sound signals.
  • the mobile phone can also be equipped with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the microphone 170C may be used to adopt the user's voiceprint information.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the mobile phone can use the collected fingerprint characteristics to unlock the fingerprint, access the application lock, take photos with the fingerprint, and answer calls with the fingerprint.
  • a fingerprint sensor can be arranged on the front of the mobile phone (below the display 194), or on the back of the mobile phone (below the rear camera).
  • the fingerprint recognition function can also be realized by configuring the fingerprint sensor in the touch screen, that is, the fingerprint sensor can be integrated with the touch screen to realize the fingerprint recognition function of the mobile phone.
  • the fingerprint sensor may be configured in the touch screen, may be a part of the touch screen, or may be configured in the touch screen in other ways.
  • the fingerprint sensor can also be implemented as a full panel fingerprint sensor.
  • the touch screen can be regarded as a panel that can collect fingerprints at any position.
  • the fingerprint sensor may process the collected fingerprint (for example, whether the fingerprint is verified) and send it to the processor 110, and the processor 110 will make corresponding processing according to the fingerprint processing result.
  • the fingerprint sensor may also send the collected fingerprint to the processor 110, so that the processor 110 can process the fingerprint (for example, fingerprint verification, etc.).
  • the fingerprint sensor 180H may be used to adopt the fingerprint information of the user.
  • the mobile phone may also include a Bluetooth device, a positioning device, a flashlight, a miniature projection device, a near field communication (NFC) device, etc., which will not be repeated here.
  • a Bluetooth device a positioning device
  • a flashlight a miniature projection device
  • NFC near field communication
  • the living body detection neural network in the embodiment of the present application may be trained in advance. It is understandable that the living body detection neural network may include one or more convolutional neural networks, and different convolutional neural networks can implement different functions. Exemplarily, the living body detection neural network may include a deep image generation network and a detection network. Each network can include one or more types of convolutional neural networks.
  • the living body detection method provided by the embodiments of the present application performs living body detection by generating images of other modalities from RGB images. For ease of description, the embodiment of the present application takes a depth image as an example for description. It is understandable that the embodiment of the present application may also generate other types of images, such as infrared images, for living body detection. The embodiment of the application does not limit this.
  • FIG. 2 exemplarily shows a schematic flow chart of a training method for a living body detection neural network provided by an embodiment of the present application. As shown in Figure 2, the method includes:
  • S202 Acquire a first image and a first depth image corresponding to the first image.
  • the training data When training the live detection model, the training data needs to be obtained first.
  • the acquired training data may be the first image and the first depth image corresponding to the first image.
  • the first image is an RGB image.
  • the training data may include multiple first images and first depth images corresponding to the multiple first images.
  • the type of the first image may not be limited to an RGB image, and the image corresponding to the first image may also be another type of image, such as an infrared image.
  • the first image and the images corresponding to the first image are images of different types, the requirements of the training data in the embodiment of the present application are met.
  • the following takes the first image as an RGB image and the image corresponding to the first image as the first depth image as an example.
  • the first image and the first depth image corresponding to the first image may be pictures taken of the same object.
  • a binocular camera is used to photograph user A, and the first image and the first depth image corresponding to the first image are generated at the same time. It is understandable that the first image and the first depth image can be regarded as being taken of the user A at the same angle.
  • the first depth image may also be generated using the principle of binocular stereo vision. For example, using two cameras to obtain two images of the surrounding scenery from different angles at the same time, or a single camera to obtain two images of the surrounding scenery from different angles at different times, and then based on the principle of parallax, the three-dimensional geometric information of the object can be restored, thereby obtaining Depth image.
  • the first image and the first depth image corresponding to the first image may be in a portrait database or collected.
  • the embodiment of the present application does not limit the acquisition method of the first image and the first depth image.
  • the first image and the first depth image are both shots of the same object, for example, the same human face, and the difference is only the type of the image.
  • the second depth image can be generated by the living body detection neural network. It is understandable that the first depth image is the original image taken by the camera.
  • the second depth image is generated by algorithm by extracting the features of the first image.
  • the embodiment of the present application may use the depth image generation network in the living body detection neural network to generate the second depth image.
  • the depth image generation network may include two independent convolutional neural networks, and the second depth image is generated through a coarse-to-fine (CTF) method.
  • CTF coarse-to-fine
  • one convolutional neural network can be used to extract the coarse-grained features of the first image
  • the other convolutional neural network can be used to extract the fine-grained features of the first image.
  • the coarse-grained features and fine-grained features of the first image are generated through a fusion algorithm to generate a second depth image.
  • these two independent convolutional neural networks can be lightweight convolutional neural networks.
  • the advantage of lightweight convolutional neural networks is that they can be used on mobile devices while reducing network parameters without losing network performance.
  • the lightweight convolutional neural network may be a feature network (feather-net). This kind of network can guarantee operation speed and accuracy at the same time.
  • Fig. 3 shows a method for generating a depth image based on coarse-to-fine provided by an embodiment of the present application. As shown in Figure 3, the method includes:
  • the first neural network may be a lightweight convolutional neural network.
  • the coarse-grained feature of the first image can be obtained.
  • S304 Acquire the fine granularity feature of the first image through the second neural network.
  • the first neural network may be a lightweight convolutional neural network.
  • the fine-grained feature of the first image can be obtained.
  • the characteristics of coarse granularity and fine granularity are relative concepts.
  • the contour feature of the face can be defined as a coarse-grained feature. It is possible to define the local features of the face, such as the feature of eyebrows, as fine-grained features.
  • the first neural network may be a deep neural network or a convolutional neural network.
  • the second neural network can be a deep neural network or a convolutional neural network.
  • step S302 may be executed after step S304, may also be executed before step S304, or may be executed simultaneously with step S304.
  • the two features can be merged to generate a second depth image.
  • the embodiment of the present application does not limit the specific algorithm of fusion.
  • the second depth image is generated by fusing the coarse granularity feature and the fine granularity feature, which helps to improve the robustness.
  • S206 Acquire a first loss value according to the first depth image and the second depth image.
  • the difference between the second depth image and the first depth image can be compared through an algorithm to determine the first loss value.
  • a scale invariant algorithm can be used to compare the difference between the second depth image and the first depth image to determine the scale invariant loss.
  • the first loss value is used to indicate the difference between the second depth image and the first depth image.
  • the smaller the first loss value the smaller the difference between the second depth image and the first depth image.
  • the living body detection neural network generates the second depth image more and more accurate, so that the second depth image is getting closer and closer to the first depth image that is actually taken.
  • the first loss value can be made smaller and smaller, so that the second depth map generated by the living body detection neural network is getting closer and closer to the first depth map that is actually shot. It is understandable that this step is performed when training the living body detection neural network.
  • the electronic device uses the living body detection neural network to detect the living body, there is no need to perform loss calculation on the generated depth image.
  • the features of the first image and the second depth image can be extracted separately through the vitality detection neural network, and then the features extracted from the two images are merged to form a feature map.
  • the features of the first image and the second depth image may be extracted by two independent feature extraction networks in the detection network.
  • the two feature extraction networks may be two identical and independent backbone networks.
  • the backbone network is a model of deep learning, which is used to extract the features of the image and give the representation of different sizes and different abstract levels of the image.
  • these two feature extraction networks can also be lightweight convolutional neural networks.
  • the feature map not only includes feature values, but also includes relative position information. For example, for a face image, the eyes, nose, and mouth are all arranged from top to bottom, and the corresponding feature values extracted are also arranged in this order.
  • the first image and the second depth image can be separately input into at least one convolutional layer, and the features are extracted and then fused, and finally a feature map is formed.
  • Tensor is a data container used to store data. For example, for an RGB image, it can be processed into a 3D tensor, and each two-dimensional column has three elements, which represent the red, yellow, and blue values of a pixel.
  • Feature fusion can be achieved through a variety of fusion algorithms.
  • the feature fusion algorithm may include an algorithm based on Bayesian decision theory, an algorithm based on sparse representation theory, or an algorithm based on deep learning theory.
  • the embodiment of the present application does not limit the specific algorithm implemented by feature fusion.
  • S210 Acquire a second loss value based on the global feature.
  • Global feature refers to the overall attributes of an image.
  • the global features can be color features, texture features, shape features, and so on. Global features are easily disturbed by the external environment.
  • the fused feature map after the fused feature map is obtained, it can be input into the detection network for processing. For example, global pooling is performed on the feature map through the detection network to obtain global features. Then the global features are further input to the fully connected layer for live detection.
  • the loss function layer can be connected to the detection network, and the probability vector can be obtained through the loss function layer.
  • the second loss value is determined according to the probability vector and the label of the first image.
  • the loss function may be a softmax function. Because the input first image and the first depth image can be labeled in advance for classification. Therefore, the difference between the probability output by the softmax function and the label can be compared to obtain the loss.
  • the process of living body detection model training is to find the optimal model parameters so as to minimize the loss.
  • the label vector can be cat[1,0,0], duck[0,1,0], chicken[0,0,1].
  • Image A is one of the training images, and its label vector is cat[1,0,0].
  • the probability vector output by the softmax function is [0.65, 0.05, 0.3].
  • a loss value can be obtained.
  • the loss can be -log(0.65).
  • the labels of the images used to train the living body detection neural network are living body and non-living body.
  • the label vector of the living body is [1,0]
  • the label vector of the non-living body is [0,1].
  • the probability vector output by the loss function layer is [0.7, 0.3].
  • the second loss value can be determined by comparing the difference between the probability vector and the label vector through an algorithm. For example, the second loss value may be 0.3.
  • the second loss value may also be called a global learning loss (global learning loss), which is used to compare the difference between the global feature and the real value.
  • global learning loss global learning loss
  • the detection network in the living body detection neural network can remove the loss function layer after the training is completed, and only needs to output the judgment of whether it is a living body.
  • Local feature refers to the feature extracted from the local area of the image.
  • the correlation between local features is small.
  • the local features can be eye, nose, and mouth features.
  • the local features can reflect the nuances of the image, and it is not easy to be disturbed by the external environment.
  • the processed local feature can be compared with the real value or the real label to determine the third loss value.
  • the fusion features generated in step S208 are subjected to block reinforcement learning, that is, after local features are extracted, local feature training is performed to obtain better living body detection performance.
  • FIG. 4 shows a schematic diagram of a local feature training provided by an embodiment of the present application.
  • the feature map can be partially divided.
  • the feature map can be divided into a first part 401, a second part 402, and a third part 403.
  • the first part 401, the second part 402 and the third part 403 are pooled, convolved, and then input to the fully connected layer, and finally the third loss is determined by the loss function.
  • the embodiment of the present application does not limit the specific determination methods of pooling, convolution, full connection, and loss.
  • the convolution process may adopt a 1*1 convolution mode.
  • the output probability vector is used to characterize the probability of whether the feature is a living body. Then, the probability vector is compared with the label vector to obtain the third loss value.
  • the first part 401 may be an eye feature
  • the second part 402 may be a nose feature
  • the third part 403 may be a mouth feature.
  • These three parts are input into the loss function for processing, and the probabilities of the living body are output respectively.
  • the labels are live and non-living.
  • the probability vector output by the eye feature through the loss function layer is [0.5, 0.5]
  • the probability vector related to the nose feature is [0.6, 0.4]
  • the probability vector related to the mouth feature is [0.7, 0.3]. Since there are only living and non-living labels, if the label of the first image is a living label [1,0], it can be considered that the label vector corresponding to the eye, nose, and mouth features is also [1,0].
  • the third loss value includes 0.5, 0.4, and 0.3 according to the probability vector and the label vector.
  • step S302 may be executed after step S304, may also be executed before step S304, or may be executed simultaneously with step S304.
  • local feature learning is to train the living body detection neural network, including the deep image generation network and the detection network.
  • some optimization algorithms can be used to train the living body detection model, that is, iterative learning, to minimize the loss value as much as possible.
  • the Stochastic Gradient Descent (SGD) method can be used to iteratively adjust the parameters in the living body detection model, so that the loss value calculated each time becomes less and less. Specifically, by adjusting the weight of the convolution kernel in the convolution layer, the loss value becomes smaller and smaller.
  • the living body detection model can be deployed on an electronic device for living body detection.
  • the living body detection neural network training method provided in this application combines local features and global features to learn together, which can improve the robustness of the living body detection neural network
  • the living body detection neural network after the living body detection neural network is trained, it can be applied to electronic equipment for living body detection.
  • the electronic device 101 shown in FIG. 1b may use the living body detection neural network provided by the embodiment of the present application.
  • the electronic device 101 can turn on the camera 105, and determine whether the human face in the image captured by the camera is a living body through the living body detection neural network.
  • the electronic device can determine whether the user making the gesture is a living body through the living body detection neural network.
  • the electronic device determines whether to switch to the child mode, it can determine whether the human face in the image belongs to a child's live human face or a child's photo on the wall through the image taken by the camera.
  • the living body detection neural network can be pre-set in the electronic device, or it can be downloaded to the electronic device through the server.
  • the embodiment of the present application does not limit the specific deployment mode of the living body detection model.
  • Fig. 5 shows a flow chart of applying a living body detection model provided by an embodiment of the present application. This method can be applied to the electronic equipment introduced above, and can also be applied to the electronic equipment not introduced above.
  • the electronic device 101 shown in FIG. 1b is taken as an example for description.
  • the electronic device 101 acquires a current image.
  • the current image may be an image captured by the electronic device 101 through the camera 105, or an image captured by the electronic device 101 through another electronic device.
  • the electronic device 101 when the electronic device 101 needs to activate the portrait tracking or face tracking function, it may start the step of acquiring the current image.
  • the electronic device 101 may start to acquire the current image through the camera 105.
  • the current image may be an RGB image.
  • the current image can be an image taken by the camera alone, or it can be the current frame or a certain frame of image in the video that the camera continuously takes.
  • the electronic device can obtain the current image directly from the camera, or obtain the current image through the camera of another device.
  • the electronic device is physically or wirelessly connected to an independent camera, and the current image is obtained from the independent camera.
  • the electronic device can also obtain the current image through the camera of another electronic device.
  • the electronic device may also obtain the current image through the cloud server. For example, after the camera of an entrance guard captures the current image, it is transmitted to the cloud server, and then the cloud server sends the current image to the electronic device for identification.
  • the embodiment of the present application does not limit whether the acquired image is the currently captured image.
  • S504 Perform face detection on the current image, and determine at least one face.
  • Face detection is a technology that finds the position and size of a human face in any image. It can detect facial features and ignore other things such as buildings, trees, and bodies.
  • the electronic device 101 can also determine how many faces the current image contains according to the face detection algorithm. For example, the electronic device 101 may determine that three human faces are included in the current image according to a face recognition algorithm, and determine the specific positions of the three human faces in the image. In other words, the electronic device 101 can obtain data of at least one face in the current image.
  • step S502 If it is determined that the current image does not contain a human face, then it can return to step S502 to continue to acquire the image.
  • step S506 can be continued.
  • Face detection can be implemented in many ways. For example, human faces can be recognized based on geometric features, templates, or models. The embodiment of the present application does not limit the specific algorithm of face detection.
  • S506 Align at least one human face.
  • Facial alignment refers to locating key facial feature points, such as eyes, nose tip, etc., according to the input face image.
  • face feature detection face feature detection
  • landmark of the human face can be detected. Landmark is used to mark various key positions of the face, such as the sides, corners, contours, etc. of the face. Landmark is used to describe the shape of a human face.
  • a series of landmark points may be obtained after landmark detection is performed on at least one detected face. Use the detected landmarks and the template's landmarks to calculate the affine matrix H, and then use H to directly calculate the aligned faces.
  • the electronic device can directly input the image of the human face into the living body detection model for living body recognition.
  • S508 Perform preprocessing on at least one face.
  • the live detection model often requires the input image to have a uniform size or size. At this time, it is necessary to preprocess at least one face according to the requirements of the live detection model for the input image.
  • the preprocessing may include denoising the face image, cropping, resizing the image, rotating the posture, and so on.
  • This step is optional. It can be understood that the embodiment of the present application does not limit the specific manner of preprocessing.
  • the preprocessed at least one face image can be input into the living body detection neural network for living body detection, and then the living body detection result is determined.
  • the living body detection result is used to indicate whether the face in the input face image is a living body.
  • the electronic device inputs the 3 human face images into the living body detection neural network after recognition to determine whether the human faces in the 3 human face images are living bodies.
  • step S512 is executed.
  • step S502 If the human face in the current image does not have a living human face, return to step S502 to continue acquiring the current image.
  • S512 Perform an action according to the result of the living body detection.
  • This step is optional. After the electronic device obtains the living body detection result, it can perform corresponding actions according to the living body detection result.
  • the user uses the electronic device 101 to make a video call with another user.
  • the electronic device 101 obtains the current image through the camera 105. If the live detection result shows that a face in the current image is alive, the electronic device can start a face tracking algorithm to track the live face, and the portrait is adjusted in the center of the video screen. . For example, the face 106 in FIG. 1b can be placed in the middle of the picture. If there is no living body detection result showing that there is a living body, the electronic device can continue the living body detection without adjusting the screen. Or, the electronic device can adjust the angle of the camera and reacquire images for live detection.
  • the electronic device 101 may continue to perform live body detection during the video call to filter out non-living bodies. For example, during the passage, the user raises the mobile phone to show the other person a photo of the third person. At this time, the electronic device 101 may determine that the avatar of the photo is not alive through the live body detection, and will not adjust the photo to the middle of the screen.
  • the electronic device 101 after the electronic device 101 determines the face of the living body, it can use the face tracking algorithm to track the face of the living body, and no more living body detection is performed. When the face of the living body disappears, the living body detection is restarted.
  • the electronic device can further determine whether the living human face is a child. If the electronic device determines that the living human face is a child, the child mode is activated. If the living body detection result shows that there is no living body face, the electronic device can continue to display the current interface or enter the standby state as soon as possible to reduce power consumption.
  • the embodiments of the present application do not limit the actions performed according to the results of the living body detection, and the electronic device can perform any actions according to the results of the living body detection.
  • tracking the face of a living body is only one of the application situations of the living body detection neural network provided in the embodiment of the present application. Any scene that requires living body detection may be applied to what is provided in the embodiment of the present application. For example, users use electronic devices to make payments, child mode, gesture control, etc.
  • the embodiments of the present application do not limit the application scenarios of the living body detection neural network.
  • Fig. 6 shows a flowchart of another living body detection method provided by an embodiment of the present application.
  • the living body detection method can be applied to the electronic equipment introduced above, can also be applied to electronic equipment not introduced above, or run in a cloud server.
  • This application does not limit the specific types of electronic devices, as long as they have computing capabilities.
  • S602 Acquire a second image of the target object.
  • the target object refers to the face that needs to be detected in vivo.
  • the second image may be an RGB image.
  • the second image can be obtained directly from the camera, or it can be a processed image.
  • the second image can be obtained in the manner of steps S502-S508 shown in FIG. 5.
  • the embodiment of the present application does not limit the specific acquisition method of the second image.
  • S604 Generate a third depth image according to the second image.
  • the second image is input into the living body detection neural network.
  • the electronic device may generate a third depth image corresponding to the second image according to the living body detection neural network.
  • the third depth image corresponding to the second image can be generated according to the depth image generation network in the living body detection neural network.
  • step S204 shown in FIG. 2 For the specific generation method, refer to step S204 shown in FIG. 2 and the coarse-to-fine method shown in FIG. 3.
  • S606 Acquire a global feature according to the second image and the third depth image.
  • the features of the second image and the third depth image can be extracted according to the detection network in the living body detection neural network, and then the features of the two images are fused to form a fused feature map.
  • the detection network may include two independent backbone networks to extract the features of the second image and the third depth image respectively.
  • the global features can be further output by means such as pooling.
  • S608 Determine a living body detection result based on the global feature.
  • the living body detection result can be determined according to the detection network.
  • the live detection result is used to indicate whether the target object is alive.
  • the detection network at this time directly outputs the judgment of whether the target object is a living body, and does not need to process through the loss function layer to output the probability vector.
  • the detection network in the living body detection neural network can determine whether the target object in the second image is a living body according to the second image and the third depth image corresponding to the second image.
  • the living body detection method provided by the embodiments of the present application does not require additional hardware to obtain multi-modal data, and can directly generate a depth image from an RGB image, merge the depth image and the RGB image, and perform living body detection according to the fused features. This detection method not only guarantees the accuracy of living body detection, but also requires additional costs.
  • the living body detection method provided by the embodiments of the present application can use a lightweight convolutional neural network in the process of generating a depth image, which greatly reduces the amount of calculation.
  • FIG. 7 is a schematic diagram of the structure of the server provided by the embodiment of the application.
  • the chip or circuit of an electronic device for example, a chip or circuit that can be set in a cloud computing platform.
  • the server 1301 may further include a bus system, wherein the processor 1302, the memory 1304, and the communication interface 1303 may be connected through the bus system.
  • the aforementioned processor 1302 may be a chip.
  • the processor 1302 may be a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or a system on chip (SoC). It can be a central processor unit (CPU), a network processor (NP), a digital signal processing circuit (digital signal processor, DSP), or a microcontroller (microcontroller). unit, MCU), and may also be a programmable logic device (PLD) or other integrated chips.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • SoC system on chip
  • CPU central processor unit
  • NP network processor
  • DSP digital signal processing circuit
  • microcontroller microcontroller
  • unit, MCU and may also be a programmable logic device (PLD) or other integrated chips.
  • PLD programmable logic device
  • the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1302 or instructions in the form of software.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor 1302.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1304, and the processor 1302 reads the information in the memory 1304, and completes the steps of the foregoing method in combination with its hardware.
  • the processor 1302 in the embodiment of the present application may be an integrated circuit chip with signal processing capability.
  • the steps of the foregoing method embodiments may be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components .
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory 1304 in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the server 1301 may include a processor 1302, a communication interface 1303, and a memory 1304.
  • the memory 1304 is used to store instructions
  • the processor 1302 is used to execute the instructions stored in the memory 1304, so as to implement the related solution of the server in the method corresponding to any one of the above embodiments.
  • FIG. 8 is a schematic diagram of a server provided by an embodiment of the application.
  • the server 1501 may be a server, or a chip or circuit, such as a chip or circuit that can be installed in the server. .
  • the division of the above server units is only a division of logical functions, and may be fully or partially integrated into one physical entity during actual implementation, or may be physically separated.
  • the transceiver unit 1503 may be implemented by the communication interface 1303 in FIG. 7 described above, and the processing unit 1502 may be implemented by the processor 1302 in FIG. 7 described above.
  • the present application also provides a computer program product.
  • the computer program product includes: computer program code.
  • the computer program code runs on a computer, the computer executes the steps shown in FIGS. 2 to 6. The method of any one of the embodiments is shown.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable medium stores a program code, and when the program code runs on a computer, the computer executes FIGS. 2 to The method of any one of the embodiments shown in FIG. 6.
  • the embodiment of the present application also provides an electronic device, which includes the aforementioned server.
  • the embodiment of the present application also provides a system, which includes the aforementioned operating device, one or more authentication devices, and the aforementioned server.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer instructions When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • Computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions may be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium can be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (solid state disc, SSD)) )Wait.
  • a magnetic medium for example, a floppy disk, a hard disk, and a magnetic tape
  • an optical medium for example, a high-density digital video disc (digital video disc, DVD)
  • a semiconductor medium for example, a solid state disk (solid state disc, SSD)
  • the server in the foregoing device embodiments corresponds to the server or the server in the method embodiment, and the corresponding module or unit executes the corresponding steps.
  • the communication unit executes the steps of receiving or sending in the method embodiment, except Steps other than sending and receiving can be executed by the processing unit (processor).
  • the processing unit processor
  • the functions of specific units refer to the corresponding method embodiments. Among them, there may be one or more processors.
  • component used in this specification are used to denote computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution.
  • the component may be, but is not limited to, a process, a processor, an object, an executable file, an execution thread, a program, and/or a computer running on a processor.
  • the application running on the computing device and the computing device can be components.
  • One or more components may reside in processes and/or threads of execution, and components may be located on one computer and/or distributed between two or more computers.
  • these components can be executed from various computer readable media having various data structures stored thereon.
  • the component can be based on, for example, a signal having one or more data packets (e.g. data from two components interacting with another component in a local system, a distributed system, and/or a network, such as the Internet that interacts with other systems through a signal) Communicate through local and/or remote processes.
  • a signal having one or more data packets (e.g. data from two components interacting with another component in a local system, a distributed system, and/or a network, such as the Internet that interacts with other systems through a signal) Communicate through local and/or remote processes.
  • the disclosed system, device, and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is realized in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a server, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the field of data processing and provides a monocular camera-based liveness detection method. An RGB image is used to determine whether the face in the image is live. The method comprises: obtaining a first image, the first image being an RGB image, and the first image comprising a face image of a target object; obtaining a first depth image according to the first image and a depth image generation network; determining a liveness detection result according to the first image, the first depth image, and a detection network, the liveness detection result being used for indicating whether the target object is live; and executing an action on the basis of the liveness detection result.

Description

一种基于单目摄像头的活体检测方法、设备和可读存储介质Living body detection method, equipment and readable storage medium based on monocular camera
本申请要求在2020年4月26日提交中国国家知识产权局、申请号为202010338191.8的中国专利申请的优先权,发明名称为“一种基于单目摄像头的活体检测方法、设备和可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of a Chinese patent application filed with the State Intellectual Property Office of China with the application number 202010338191.8 on April 26, 2020, and the title of the invention is "A method, equipment and readable storage medium for living body detection based on a monocular camera The priority of the Chinese patent application of ”, the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及数据处理领域,尤其涉及一种基于单摄像头RGB图像的活体检测方法、设备和可读存储介质。This application relates to the field of data processing, and in particular to a living body detection method, device and readable storage medium based on a single-camera RGB image.
背景技术Background technique
随着智能终端以及各种带摄像头的设备的普及,为基于人脸检测和识别的生物特征识别提供了广泛的基础。对于人脸检测识别而言,活体检测在很多场景通常是不可或缺的,如支付、门禁等涉及到安全的场合。对于华为终端的产品,诸如大屏中的儿童模式、视频通话人像追踪等。With the popularization of smart terminals and various devices with cameras, a broad foundation has been provided for biometric recognition based on face detection and recognition. For face detection and recognition, living body detection is usually indispensable in many scenarios, such as payment, access control and other security-related occasions. For Huawei terminal products, such as the child mode in the big screen, video call portrait tracking, etc.
通常,为了提高活体检测的准确性,会引入多模态的图像。除去常见的RGB图像以外,会引入诸如红外图像、深度图像等,来弥补常规可见光图像的不足。深度图像可以通过双目摄像头、以及特定设备来生成,但该方式成本相对较高,并且普及性也相对较低。Generally, in order to improve the accuracy of living body detection, multi-modal images are introduced. In addition to common RGB images, infrared images, depth images, etc., will be introduced to make up for the shortcomings of conventional visible light images. Depth images can be generated by binocular cameras and specific equipment, but this method is relatively expensive and has relatively low popularity.
发明内容Summary of the invention
本申请实施例提供一种基于单目摄像头的活体检测方法,能够在保证活体检测精度的同时,不用增加额外的成本。The embodiment of the present application provides a living body detection method based on a monocular camera, which can ensure the accuracy of living body detection without adding additional costs.
为达到上述目的,本申请实施例采用如下技术方案:In order to achieve the foregoing objectives, the following technical solutions are adopted in the embodiments of this application:
第一方面,提供一种活体检测方法,该方法应用于电子设备,该方法可以包括:获取第一图像,该第一图像为RGB图像,且,该第一图像包括目标对象的人脸图像;根据该第一图像和深度图像生成网络获得第一深度图像;根据该第一图像、该第一深度图像和检测网络确定活体检测结果;该活体检测结果用于指示该目标对象是否为活体;根据该活体检测结果执行动作。In a first aspect, a living body detection method is provided, the method is applied to an electronic device, the method may include: acquiring a first image, the first image is an RGB image, and the first image includes a face image of a target object; Obtain a first depth image according to the first image and the depth image generation network; determine the living body detection result according to the first image, the first depth image, and the detection network; the living body detection result is used to indicate whether the target object is a living body; according to The result of the living body test executes the action.
上述第一方面提供的技术方案,通过一个RGB图像生成深度图像,再根据RGB图像和深度图像进行活体检测。这种方式可以有效防止活体检测中的攻击行为,在提高活体检测的准确率同时,不需要增加额外的设备获取深度图像,有效的降低成本。The technical solution provided by the above-mentioned first aspect generates a depth image from an RGB image, and then performs a living body detection based on the RGB image and the depth image. This method can effectively prevent the aggressive behavior in the living body detection, while improving the accuracy of the living body detection, without adding additional equipment to obtain the depth image, effectively reducing the cost.
在一种可能的实现方式中,该深度图像生成网络包括第一神经网络和第二神经网络;该根据该第一图像和深度图像生成网络获得第一深度图像具体包括:通 过该第一神经网络提取该第一图像的粗颗粒度特征;通过该第二神经网络提取该第一图像的细颗粒度特征;根据该粗颗粒度特征和该细颗粒度特征生成该第一深度图像。In a possible implementation manner, the depth image generation network includes a first neural network and a second neural network; the obtaining the first depth image according to the first image and the depth image generation network specifically includes: through the first neural network Extracting the coarse granularity feature of the first image; extracting the fine granularity feature of the first image through the second neural network; generating the first depth image according to the coarse granularity feature and the fine granularity feature.
在一种可能的实现方式中,该根据该粗颗粒度特征和该细颗粒度特征生成该第一深度图像包括:获取融合特征,该融合特征是通过融合算法融合该粗颗粒度特征和该细颗粒度特征获得的;根据该融合特征生成该第一深度图像。In a possible implementation manner, the generating the first depth image according to the coarse-grained feature and the fine-grained feature includes: acquiring a fusion feature, and the fused feature is to fuse the coarse-grained feature and the fine-grained feature through a fusion algorithm. Granularity feature is obtained; the first depth image is generated according to the fusion feature.
在一种可能的实现方式中,该第一神经网络和该第二神经网络为轻量级卷积神经网络。In a possible implementation manner, the first neural network and the second neural network are lightweight convolutional neural networks.
在一种可能的实现方式中,该检测网络包括第三神经网络和第四神经网络,该根据第一图像、第一深度图像和检测网络确定活体检测结果具体包括:通过第三神经网络提取该第一图像的特征;通过第四神经网络提取该第一深度图像的特征;根据该第一图像的特征和该第一深度图像的特征获取特征图;根据该特征图确定活体检测结果。In a possible implementation manner, the detection network includes a third neural network and a fourth neural network, and the determining the living body detection result according to the first image, the first depth image, and the detection network specifically includes: extracting the detection result through the third neural network. The feature of the first image; the feature of the first depth image is extracted through the fourth neural network; the feature map is obtained according to the feature of the first image and the feature of the first depth image; the living body detection result is determined according to the feature map.
在一种可能的实现方式中,该根据该特征图确定活体检测结果具体包括:将该特征图进行全局池化获取全局特征;根据该全局特征确定活体检测结果。In a possible implementation manner, the determining the living body detection result according to the characteristic map specifically includes: performing global pooling on the characteristic map to obtain the global characteristic; and determining the living body detection result according to the global characteristic.
在一种可能的实现方式中,该方法还包括:获取第二图像;根据人脸检测算法获取该第二图像中的该人脸图像;根据该人脸图像确定该第一图像。In a possible implementation manner, the method further includes: acquiring a second image; acquiring the face image in the second image according to a face detection algorithm; and determining the first image according to the face image.
在一种可能的实现方式中,该第一图像为经过对齐和预处理的人脸图像。9、如权利要求1该的方法,其特征在于,根据该活体检测结果执行动作包括:当该活体检测结果指示该目标对象为活体时,对该目标对象进行人像跟踪。In a possible implementation, the first image is a face image that has been aligned and preprocessed. 9. The method of claim 1, wherein performing an action according to the result of the living body detection comprises: when the result of the living body detection indicates that the target object is a living body, performing portrait tracking on the target object.
在一种可能的实现方式中,根据该活体检测结果执行动作包括:当该活体检测结果指示该目标对象为活体时,确定该目标对象是否为儿童;当该目标对象为儿童,切换至儿童模式。In a possible implementation manner, performing an action based on the result of the living body detection includes: when the result of the living body detection indicates that the target object is a living body, determining whether the target object is a child; when the target object is a child, switching to a child mode .
在一种可能的实现方式中,该深度图像生成网络和该检测网络属于活体检测神经网络;In a possible implementation, the depth image generation network and the detection network belong to a living body detection neural network;
该活体检测神经网络是根据局部特征和该全局特征联合训练获得的。The living body detection neural network is obtained by joint training based on the local feature and the global feature.
第二方面,提供一种电子设备,该电子设备具有实现上述第一方面任一种可能的实现方式中的所述的方法和功能该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。In a second aspect, an electronic device is provided, which has the methods and functions described in any one of the possible implementations of the first aspect described above. The function can be implemented by hardware, or by hardware executing corresponding software. . The hardware or software includes one or more modules corresponding to the above-mentioned functions.
第三方面,提供一种电子设备,包括一个或多个处理器;存储器;摄像头;以及一个或多个计算机程序,其中该一个或多个计算机程序被存储在该存储器中,该一个或多个计算机程序包括指令,当该指令被该电子设备执行时,使得该电子设备执行如上述第一方面任一种可能的实现方式。In a third aspect, an electronic device is provided, including one or more processors; a memory; a camera; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more The computer program includes instructions, and when the instructions are executed by the electronic device, the electronic device executes any one of the possible implementation manners of the first aspect described above.
第四方面,提供一种计算机可读存储介质,包括计算机指令,当该计算机指令在电子设备上运行时,使得该电子设备执行如上述第一方面任一种可能的实现方式。In a fourth aspect, a computer-readable storage medium is provided, including computer instructions, which when executed on an electronic device, cause the electronic device to execute any of the possible implementation manners of the first aspect described above.
第五方面,提供一种芯片,与电子设备中的存储器耦合,使得该芯片在运行时调用所述存储器中存储的程序指令,使得该电子设备执行如执行如上述第一方 面任一种可能的实现方式。In a fifth aspect, a chip is provided, which is coupled with a memory in an electronic device, so that the chip invokes program instructions stored in the memory when it is running, so that the electronic device executes any of the possible methods as in the first aspect above. Method to realize.
附图说明Description of the drawings
图1a为本申请实施例提供的一种活体检测方法应用场景;FIG. 1a is an application scenario of a living body detection method provided by an embodiment of this application;
图1b为本申请实施例提供的另一种活体检测方法应用场景;FIG. 1b is an application scenario of another living body detection method provided by an embodiment of the application;
图1c为本申请实施例提供的另一种活体检测方法应用场景;FIG. 1c is an application scenario of another living body detection method provided by an embodiment of the application;
图1d为本申请实施例提供的一种电子设备的结构示意图;FIG. 1d is a schematic structural diagram of an electronic device provided by an embodiment of this application;
图1e为本申请实施例提供的一种卷积神经网络示意图;FIG. 1e is a schematic diagram of a convolutional neural network provided by an embodiment of this application;
图2为本申请实施例提供的一种活体检测模型训练方法的流程示意图;2 is a schematic flowchart of a method for training a living body detection model provided by an embodiment of the application;
图3为本申请实施例提供的一种基于coarse-to-fine生成深度图像的示意图;FIG. 3 is a schematic diagram of generating a depth image based on coarse-to-fine according to an embodiment of the application;
图4为本申请实施例提供的一种局部特征训练示意图;FIG. 4 is a schematic diagram of a local feature training provided by an embodiment of this application;
图5为本申请实施例提供的一种应用活体检测模型的流程图;FIG. 5 is a flowchart of applying a living body detection model provided by an embodiment of the application;
图6为本申请实施例提供的一种电子设备的结构示意图;FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of this application;
图7为本申请实施例提供的另一种电子设备的结构示意图;FIG. 7 is a schematic structural diagram of another electronic device provided by an embodiment of the application;
图8为本申请实施例提供的另一种电子设备的结构示意图。FIG. 8 is a schematic structural diagram of another electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
本申请说明书和权利要求书及附图说明中的术语“第一”、“第二”和“第三”等是用于区别不同对象,而不是用于限定特定顺序。The terms "first", "second", and "third" in the description, claims, and description of the drawings in this application are used to distinguish different objects, rather than to limit a specific order.
在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。In the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations, or illustrations. Any embodiment or design solution described as "exemplary" or "for example" in the embodiments of the present application should not be construed as being more preferable or advantageous than other embodiments or design solutions. To be precise, words such as "exemplary" or "for example" are used to present related concepts in a specific manner.
为了下述各实施例的描述清楚简洁,首先给出相关技术的简要介绍:In order to make the description of the following embodiments clear and concise, first a brief introduction of related technologies is given:
RGB图像的每个像素点都有3个值表示颜色,即通过红,绿,蓝三个颜色的变化和相互之间的叠加得到各式各样的颜色。Each pixel of an RGB image has 3 values to represent colors, that is, a variety of colors can be obtained through the changes of the three colors of red, green, and blue and the superposition of each other.
深度图像也称为距离图像,是指将图像采集器,例如摄像头,到场景中个点的距离作为每个像素点的像素值的图像。深度图像直接反映了拍摄对象可见表面的几何形状。Depth image, also called distance image, refers to an image in which the distance from an image collector, such as a camera, to a point in the scene is used as the pixel value of each pixel. The depth image directly reflects the geometry of the visible surface of the subject.
单目摄像头一般指一个摄像头。单目摄像头在同一时间内只能拍摄一种类型的图像。Monocular camera generally refers to a camera. The monocular camera can only take one type of image at the same time.
双目摄像头一般指两个摄像头,可以在同一时间内获取两种不同类型的图像。例如双目摄像头可以同时获取RGB图像和深度图像。示例性的,双目摄像头可以是3D摄像头,包括彩色摄像头和深度传感器。Binocular cameras generally refer to two cameras, which can acquire two different types of images at the same time. For example, a binocular camera can simultaneously acquire RGB images and depth images. Exemplarily, the binocular camera may be a 3D camera, including a color camera and a depth sensor.
深度神经网络(deep neural network,DNN)是深度学习的一种框架,能够为复杂非线性系统提供建模。换而言之,深度神经网络可以对数据进行系统分类。Deep neural network (DNN) is a framework of deep learning that can provide modeling for complex nonlinear systems. In other words, deep neural networks can systematically classify data.
卷积神经网络(convolutional neural network,CNN)是由一个或多个卷积层 和顶端的全连接层组成,同时也包括关联权重和池化层(pooling layer)。卷积神经网络是一个自下而上的网络结构,采用多层网络,逐层抽象。每一层在上层的基础上抽象出更高阶的应对各种不变性的特征表示。参考图1e,一个卷积神经网络可以包括卷积层、池化层(pooling layer)、全连接层。在一些情况下,卷积神经网络还可以连接一个损失函数层(loss layer)。Convolutional neural network (convolutional neural network, CNN) is composed of one or more convolutional layers and a fully connected layer at the top. It also includes associated weights and pooling layers. Convolutional neural network is a bottom-up network structure, which uses a multi-layer network and abstracts layer by layer. Each layer abstracts higher-level feature representations that deal with various invariances on the basis of the upper layer. Referring to FIG. 1e, a convolutional neural network may include a convolutional layer, a pooling layer, and a fully connected layer. In some cases, the convolutional neural network can also be connected to a loss layer.
卷积层是一组平行的特征图(feature map),它通过在输入图像滑动不同的卷积核并运行一定的运算而组成。此外,在每一个滑动的位置上,卷积核与输入图像之间会运行一个元素对应乘积并求和的运算以将信息投影到特征图中的一个元素。例如,对于一个RGB图像来说,卷积层可以将图像转化为特征图。The convolutional layer is a set of parallel feature maps, which are composed by sliding different convolution kernels on the input image and running certain operations. In addition, at each sliding position, an element-corresponding product and summation operation is run between the convolution kernel and the input image to project information to an element in the feature map. For example, for an RGB image, the convolutional layer can convert the image into a feature map.
池化层是一种非线性形式的降采样,用于对特征图进行池化处理。池化可以有多种不同形式的非线性池化函数,例如最大池化(Max pooling),平均池化(average pooling)。最大池化是将输入的图像划分为若干矩形区域,对每个子区域输出最大值。例如,池化层可以将特征图进行池化处理,以减少特征图中的特征数量。The pooling layer is a non-linear form of downsampling, which is used to pool the feature map. Pooling can have a variety of different forms of non-linear pooling functions, such as max pooling and average pooling. Maximum pooling is to divide the input image into several rectangular areas, and output the maximum value for each sub-area. For example, the pooling layer can pool the feature map to reduce the number of features in the feature map.
全连接层(fully connected layer)是用于神经网络中的高级推理。例如,对于一个大小为32*32的手写体的“2”。人眼观测这张图的全局立马就能识别出这个手写体的“2”是数字2,。但是对于电子设备来说,需要将这张图的全部像素点输入到神经网络去处理才能识别。但是如果直接将所有像素点输入到全连接层处理,数据量会特别大。比如对于上述32*32大小的图像,可能会得到16亿个参数。这种情况下,可以先对图像进行预处理,再最后输入到全连接层中识别。例如,全连接层可以将经过卷积层和池化层处理后的特征映射为一维特征向量。The fully connected layer is used for advanced reasoning in neural networks. For example, "2" for a handwriting with a size of 32*32. The human eye can immediately recognize that the handwritten "2" is the number 2. But for electronic devices, all pixels in this picture need to be input to the neural network for processing to be recognized. However, if all pixels are directly input to the fully connected layer for processing, the amount of data will be extremely large. For example, for the above 32*32 size image, 1.6 billion parameters may be obtained. In this case, the image can be preprocessed first, and then finally input to the fully connected layer for recognition. For example, the fully connected layer can map the features processed by the convolutional layer and the pooling layer into a one-dimensional feature vector.
为了训练神经网络,全连接层还可以连接损失函数层。损失函数层可以用于确定神经网络训练过程中,预测结果和真实结果之间的差异。各种不同的损失函数适用于不同类型的任务。例如,Softmax函数可以将神经网络中多个神经元的输出,映射到(0,1)区间内,从而进行分类和计算。In order to train the neural network, the fully connected layer can also be connected to the loss function layer. The loss function layer can be used to determine the difference between the predicted result and the real result during the neural network training process. Various loss functions are suitable for different types of tasks. For example, the Softmax function can map the output of multiple neurons in the neural network to the (0,1) interval for classification and calculation.
换而言之,训练神经网络的过程,就是通过调整神经网络中的参数,不断缩小损失(loss)的过程。In other words, the process of training a neural network is a process of continuously reducing the loss by adjusting the parameters in the neural network.
随着对面部信息识别技术应用的越来越广发,安全性也越来越成为焦点。活体检测技术的价值在于判断用户的身份,防止照片,视频等攻击。具体来说,在一些场景中,活体检测是指判断静态图片是真是的用户行为还是照片,不需要通过用户摇头或者眨眼等动作来识别。图1a示出了活体检测的一种场景。参考图1a,当电子设备102要验证用户身份,需要判断当前操作的是真实的用户104还是用户104的人脸照片103,从而防止其他人拿着用户104的照片103获得用户104的权限,从而损害用户104的利益。例如,当其他人获得用户104的电子设备102和用户104的照片,无法凭借用户104的照片解锁电子设备102的屏幕或者完成刷脸支付等功能。As the application of facial information recognition technology becomes more and more popular, security has become more and more the focus. The value of live detection technology lies in judging the identity of users and preventing attacks such as photos and videos. Specifically, in some scenarios, liveness detection refers to a user's behavior to determine whether a static picture is a real picture or a photo, and does not need to be recognized by the user's actions such as shaking his head or blinking an eye. Figure 1a shows a scene of living body detection. 1a, when the electronic device 102 wants to verify the user identity, it needs to determine whether the current operation is the real user 104 or the face photo 103 of the user 104, so as to prevent others from holding the photo 103 of the user 104 to obtain the authority of the user 104, thereby Harm the interests of the user 104. For example, when another person obtains the electronic device 102 of the user 104 and the photo of the user 104, it is impossible to unlock the screen of the electronic device 102 or complete functions such as payment by using the photo of the user 104.
现有的活体检测主要分为单模态和多模态两种方案。其中单模态指使用同一成像设备获取的图像进行活体检测。示例性的,单模态的方案利用RGB图像输 入神经网络提取特征进行分类,然后将特征与之前保存的用户人脸特征比对,最后确定是否为活体。其特点主要是简洁、速度较快、训练和部署成本较低。多模态指使用不同成像设备获取的图像进行人脸匹配。示例性的,多模态的方案融合RGB图像和对应的多模态数据,例如红外图像、深度图像等,通过神经网络提取深度特征进行活体检测。其中深度图像可以通过双目摄像头或者其他特定的设备获取。多模态的方案优点在于精度高、不易被攻击。The existing living body detection is mainly divided into two schemes: single mode and multi-mode. Among them, single mode refers to the use of images acquired by the same imaging device for living body detection. Exemplarily, the single-modal solution uses RGB image input into a neural network to extract features for classification, and then compares the features with previously saved user facial features, and finally determines whether it is a living body. Its main characteristics are simplicity, faster speed, and lower training and deployment costs. Multi-modality refers to the use of images acquired by different imaging devices for face matching. Exemplarily, a multi-modal solution fuses RGB images and corresponding multi-modal data, such as infrared images, depth images, etc., and uses neural networks to extract depth features for live detection. The depth image can be obtained through a binocular camera or other specific equipment. The advantage of the multi-modal scheme is that it has high accuracy and is not easy to be attacked.
然而,这两种活体检测方式都存在一定的缺点。例如,单模态的活体检测方案由于缺乏其他类型的数据,容易将照片、面具等识别为用户本人,识别准确性不高。在另一些单模态的活体检测方案中,为了提高精度,除了使用静态RGB图像外,还可以通过用户的动作,例如眨眼、转头等动作判断是否为活体。但是这种方式需要用户做出指定的动作,用户体验不好。多模态的活体检测方案虽然检测的准确度较高,但多模态数据不易获取,需要多种类型的摄像头,导致成本较高。同时,多模态数据且通过神经网络训练较为复杂。However, both of these two methods of living body detection have certain shortcomings. For example, due to the lack of other types of data, a single-modal living body detection scheme can easily recognize photos, masks, etc. as the user himself, and the recognition accuracy is not high. In other single-modal living detection solutions, in order to improve accuracy, in addition to using static RGB images, it is also possible to determine whether a living body is a living body through user actions, such as blinking and turning the head. However, this method requires the user to make a specified action, and the user experience is not good. Although the multi-modal living body detection scheme has high detection accuracy, the multi-modal data is not easy to obtain, and multiple types of cameras are required, resulting in high cost. At the same time, multi-modal data and training through neural networks are more complicated.
为了克服上述技术问题,本申请提供了一种基于单摄像头RGB图像的活体检测方法。具体来说,电子设备获取到RGB图像后,可以通过活体检测神经网络确定该RGB图像中的人像是否为活体。其中活体检测神经网络可以根据RGB图像生成深度图像,然后将RGB图像和深度图像进行特征融合,根据融合后的特征确定RGB图像中的人像是否为活体。本申请实施例提供活体检测方法,不需要额外的设备获取多模态图像,同时也能保证活体检测的精度。In order to overcome the above-mentioned technical problems, this application provides a living body detection method based on a single-camera RGB image. Specifically, after the electronic device obtains the RGB image, it can determine whether the portrait in the RGB image is a living body through the living body detection neural network. Among them, the living body detection neural network can generate a depth image based on the RGB image, and then perform feature fusion between the RGB image and the depth image, and determine whether the portrait in the RGB image is alive according to the fused features. The embodiments of the present application provide a living body detection method, which does not require additional equipment to obtain a multi-modal image, and at the same time, can ensure the accuracy of living body detection.
图1b示出了一种本申请实施例的应用场景,主要应用在视屏通话的人像追踪。如图1b所示,电子设备101拥有摄像头105。示例性的,摄像头105可以拍摄RGB图像。当用户使用电子设备101与另外一个用户进行视频通话时,摄像头105可以通过本申请实施例提供的单摄像头RGB图像的活体检测技术,判断用户是否真实在摄像头前面,并调整拍摄的画面,将用户的图像设置于画面的中心。Fig. 1b shows an application scenario of an embodiment of the present application, which is mainly applied to portrait tracking of video calls. As shown in FIG. 1b, the electronic device 101 has a camera 105. Exemplarily, the camera 105 can capture RGB images. When a user uses the electronic device 101 to make a video call with another user, the camera 105 can determine whether the user is actually in front of the camera through the single-camera RGB image living detection technology provided in this embodiment of the application, and adjust the captured image to show the user The image of is set in the center of the screen.
图1c示出了另一种本申请实施例的应用场景,主要应用在大屏的儿童模式。如图1c所示,电子设备101拥有摄像头105。示例性的,电子设备101具有儿童模式,可以通过摄像头105判断当前观看屏幕的是否为儿童,从而切换到儿童模式,显示儿童观看的节目。然而在一些场景中,电子设备101对面的墙壁可能挂有儿童的图像106。如前所述,现有技术中,仅凭一个摄像头很难判断图像106是否为真实的儿童。此时,如果成人用户在通过电子设备101观看节目的时候,电子设备101可能将图像106误识别为儿童本人,切换到儿童模式,严重影响用户的体验。而本申请实施例提供的活体检测技术,可以有效通过一个摄像头检测出图像106不是真实的人,从而避免误触发儿童模式。Figure 1c shows another application scenario of an embodiment of the present application, which is mainly applied in a large-screen child mode. As shown in FIG. 1c, the electronic device 101 has a camera 105. Exemplarily, the electronic device 101 has a child mode, and the camera 105 can determine whether the person currently watching the screen is a child, so as to switch to the child mode and display the program watched by the child. However, in some scenarios, an image 106 of a child may be hung on the wall opposite to the electronic device 101. As mentioned above, in the prior art, it is difficult to determine whether the image 106 is a real child based on only one camera. At this time, if an adult user is watching a program through the electronic device 101, the electronic device 101 may misrecognize the image 106 as the child himself and switch to the child mode, which seriously affects the user's experience. The living body detection technology provided by the embodiments of the present application can effectively detect that the image 106 is not a real person through a camera, thereby avoiding false triggering of the child mode.
可以理解的是,图1b和图1c所示的应用场景仅仅是示例性的,本申请实施例所提供的是一种利用单摄像头的活体检测方法,任何可以用到活体检测的场景,例如支付、门禁等,都可以适用本申请提供的技术方案。It is understandable that the application scenarios shown in Fig. 1b and Fig. 1c are only exemplary, and the embodiments of the present application provide a living body detection method using a single camera. Any scene that can be used for living body detection, such as payment , Access control, etc., all of which can be applied to the technical solutions provided in this application.
本申请实施例中的电子设备可以是包含其它功能诸如个人数字助理和/或音 乐播放器功能的便携式电子设备,诸如手机、平板电脑、具备无线通讯功能的可穿戴电子设备(如智能手表)等。便携式电子设备的示例性实施例包括但不限于搭载
Figure PCTCN2021088272-appb-000001
或者其它操作系统的便携式电子设备、带有摄像头的智慧屏、带有摄像头的电视等。上述便携式电子设备也可以是其它便携式电子设备,诸如具有触敏表面(例如触控面板)的膝上型计算机(Laptop)等。还应当理解的是,在本申请其他一些实施例中,上述电子设备也可以不是便携式电子设备,而是具有触敏表面(例如触控面板)的台式计算机。
The electronic device in the embodiments of the present application may be a portable electronic device containing other functions such as a personal digital assistant and/or a music player function, such as a mobile phone, a tablet computer, a wearable electronic device with wireless communication function (such as a smart watch), etc. . Exemplary embodiments of portable electronic devices include, but are not limited to, carrying
Figure PCTCN2021088272-appb-000001
Or portable electronic devices with other operating systems, smart screens with cameras, TVs with cameras, etc. The aforementioned portable electronic device may also be other portable electronic devices, such as a laptop computer with a touch-sensitive surface (such as a touch panel). It should also be understood that in some other embodiments of the present application, the above-mentioned electronic device may not be a portable electronic device, but a desktop computer with a touch-sensitive surface (such as a touch panel).
本申请实施例对电子设备的具体形式不做特殊限制。The embodiments of this application do not impose special restrictions on the specific form of the electronic device.
图1d示例性示出了一种电子设备的结构示意图。图1d中以电子设备为手机进行示意。Figure 1d exemplarily shows a schematic structural diagram of an electronic device. In Figure 1d, the electronic device is a mobile phone for illustration.
应理解,图示电子设备仅是一个范例,并且电子设备可以具有比图中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。It should be understood that the illustrated electronic device is only an example, and the electronic device may have more or fewer components than shown in the figure, may combine two or more components, or may have different component configurations . The various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
如图1d所示,手机可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。As shown in Figure 1d, the mobile phone may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, and an antenna. 1. Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone interface 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display Screen 194, and subscriber identification module (SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light Sensor 180L, bone conduction sensor 180M, etc.
下面结合图1d对电子设备的各个部件进行具体的介绍:In the following, each component of the electronic device will be specifically introduced in conjunction with Figure 1d:
处理器110可以包括一个或多个处理单元,例如,处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。其中,控制器可以是手机的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait. Among them, the different processing units may be independent devices or integrated in one or more processors. Among them, the controller can be the nerve center and command center of the mobile phone. The controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。比如,可以存储本申请中的认证设备、认证方式和认证安全值的对应关系,以及操作与安全值的对应关系等。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从存储器中直接调用,从而可避免重复存取,可减少处 理器110的等待时间,因而可提高系统的效率。A memory may also be provided in the processor 110 to store instructions and data. For example, the corresponding relationship between the authentication device, the authentication method, and the authentication safety value in this application, as well as the corresponding relationship between the operation and the safety value, etc. can be stored. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory, thereby avoiding repeated access, reducing the waiting time of the processor 110, and improving the efficiency of the system.
处理器110可以用于执行本申请实施例中对用户信息进行认证的方案。当服务器集成在电子设备上时,处理器110也可以执行下述内容中提到的服务器所执行的处理方案,比如确定出操作设备对应的认证安全值,比如根据M个认证安全值计算总认证安全值等等。当处理器110集成不同的器件,比如集成CPU和GPU时,CPU和GPU可以配合执行本申请实施例提供的操作提示的方法,比如操作提示的方法中部分算法由CPU执行,另一部分算法由GPU执行,以得到较快的处理效率。The processor 110 may be configured to execute the solution for authenticating user information in the embodiment of the present application. When the server is integrated on the electronic device, the processor 110 can also execute the processing schemes executed by the server mentioned in the following content, such as determining the authentication security value corresponding to the operating device, for example, calculating the total authentication based on the M authentication security values Safety value and so on. When the processor 110 integrates different devices, such as integrated CPU and GPU, the CPU and GPU can cooperate to execute the operation prompt method provided in the embodiment of the present application. For example, in the operation prompt method, part of the algorithm is executed by the CPU, and the other part of the algorithm is executed by the GPU. Execute to get faster processing efficiency.
在一些实施例中,处理器110可以包括一个或多个接口。比如,接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。In some embodiments, the processor 110 may include one or more interfaces. For example, the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver ( universal asynchronous receiver/transmitter, UART interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface , And/or Universal Serial Bus (USB) interface, etc.
手机的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。手机的无线通信功能可以实现本申请实施例中电子设备与电子设备,以及电子设备与服务器之间的通信。The wireless communication function of the mobile phone can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor. The wireless communication function of the mobile phone can realize the communication between the electronic device and the electronic device, as well as the electronic device and the server in the embodiments of the present application.
天线1和天线2用于发射和接收电磁波信号。手机中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。The antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the mobile phone can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna can be used in combination with a tuning switch.
移动通信模块150可以提供应用在手机上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G, etc., which are applied to mobile phones. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like. The mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
无线通信模块160可以提供应用在手机上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线 2转为电磁波辐射出去。The wireless communication module 160 can provide applications on mobile phones including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), and global navigation satellite systems ( Global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves through the antenna 2 and radiate it out.
在一些实施例中,手机的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得手机可以通过无线通信技术与网络以及其他设备通信。无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the mobile phone is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the mobile phone can communicate with the network and other devices through wireless communication technology. Wireless communication technologies can include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), and broadband code division. Multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM , And/or IR technology, etc. GNSS can include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
手机通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The mobile phone realizes the display function through GPU, display screen 194, and application processor. The GPU is an image processing microprocessor, which is connected to the display screen 194 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。The display screen 194 is used to display images, videos, and the like. The display screen 194 includes a display panel. The display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED, flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
手机可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The mobile phone can realize the shooting function through ISP, camera 193, video codec, GPU, display 194 and application processor.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,手机可以包括1个或多个摄像头193。在本申请实施例中摄像头193可以用于采用用户的人脸信息。The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and is projected to the photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats of image signals. In some embodiments, the mobile phone may include one or more cameras 193. In the embodiment of the present application, the camera 193 may be used to adopt the facial information of the user.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展手机的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the mobile phone. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
内部存储器121可以用于存储计算机可执行程序代码,可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功 能等)等。存储数据区可存储手机使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,执行手机的各种功能应用以及数据处理。The internal memory 121 may be used to store computer executable program code, and the executable program code includes instructions. The internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store the operating system, at least one application program (such as sound playback function, image playback function, etc.) required by at least one function. The data storage area can store data (such as audio data, phone book, etc.) created during the use of the mobile phone. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like. The processor 110 executes various functional applications and data processing of the mobile phone by running instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。手机可以设置至少一个麦克风170C。在另一些实施例中,手机可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,手机还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。在本申请实施例中麦克风170C可以用于采用用户的声纹信息。The microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C through the human mouth, and input the sound signal into the microphone 170C. The mobile phone can be equipped with at least one microphone 170C. In other embodiments, the mobile phone may be equipped with two microphones 170C, which can realize noise reduction function in addition to collecting sound signals. In other embodiments, the mobile phone can also be equipped with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions. In the embodiment of the present application, the microphone 170C may be used to adopt the user's voiceprint information.
指纹传感器180H用于采集指纹。手机可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。例如,可以在手机的正面(显示屏194的下方)配置指纹传感器,或者,在手机的背面(后置摄像头的下方)配置指纹传感器。另外,也可以通过在触摸屏中配置指纹传感器来实现指纹识别功能,即指纹传感器可以与触摸屏集成在一起来实现手机的指纹识别功能。在这种情况下,该指纹传感器可以配置在触摸屏中,可以是触摸屏的一部分,也可以是以其他方式配置在触摸屏中。另外,该指纹传感器还可以被实现为全面板指纹传感器,因此,可以把触摸屏看成是任何位置可都可以进行指纹采集的一个面板。在一些实施例中,该指纹传感器可以对采集到的指纹进行处理(例如指纹是否验证通过)发送给处理器110,由处理器110根据指纹处理结果做出相应的处理。在另一些实施例中,还指纹传感器还可以将采集到的指纹发送给处理器110,以便处理器110对该指纹进行处理(例如指纹验证等)。在本申请实施例中指纹传感器180H可以用于采用用户的指纹信息。The fingerprint sensor 180H is used to collect fingerprints. The mobile phone can use the collected fingerprint characteristics to unlock the fingerprint, access the application lock, take photos with the fingerprint, and answer calls with the fingerprint. For example, a fingerprint sensor can be arranged on the front of the mobile phone (below the display 194), or on the back of the mobile phone (below the rear camera). In addition, the fingerprint recognition function can also be realized by configuring the fingerprint sensor in the touch screen, that is, the fingerprint sensor can be integrated with the touch screen to realize the fingerprint recognition function of the mobile phone. In this case, the fingerprint sensor may be configured in the touch screen, may be a part of the touch screen, or may be configured in the touch screen in other ways. In addition, the fingerprint sensor can also be implemented as a full panel fingerprint sensor. Therefore, the touch screen can be regarded as a panel that can collect fingerprints at any position. In some embodiments, the fingerprint sensor may process the collected fingerprint (for example, whether the fingerprint is verified) and send it to the processor 110, and the processor 110 will make corresponding processing according to the fingerprint processing result. In other embodiments, the fingerprint sensor may also send the collected fingerprint to the processor 110, so that the processor 110 can process the fingerprint (for example, fingerprint verification, etc.). In the embodiment of the present application, the fingerprint sensor 180H may be used to adopt the fingerprint information of the user.
尽管图1d中未示出,手机还可以包括蓝牙装置、定位装置、闪光灯、微型投影装置、近场通信(near field communication,NFC)装置等,在此不予赘述。Although not shown in FIG. 1d, the mobile phone may also include a Bluetooth device, a positioning device, a flashlight, a miniature projection device, a near field communication (NFC) device, etc., which will not be repeated here.
为了提高活体检测的准确度,本申请实施例中的活体检测神经网络可以提前进行训练。可以理解的是,活体检测神经网络可以包括一个或者多个卷积神经网络,不同的卷积神经网络可以实现不同的功能。示例性的,活体检测神经网络可以包括深度图像生成网络和检测网络。每个网络可以包括一个或者多个类型的卷积神经网络。本申请实施例所提供的活体检测方法是通过将RGB图像生成其他 模态的图像来进行活体检测。为方便描述,本申请实施例以深度图像为例进行描述。可以理解的是,本申请实施例也可以通过生成其他类型的图像,例如红外图像,来进行活体检测。本申请实施例对此不做限定。In order to improve the accuracy of living body detection, the living body detection neural network in the embodiment of the present application may be trained in advance. It is understandable that the living body detection neural network may include one or more convolutional neural networks, and different convolutional neural networks can implement different functions. Exemplarily, the living body detection neural network may include a deep image generation network and a detection network. Each network can include one or more types of convolutional neural networks. The living body detection method provided by the embodiments of the present application performs living body detection by generating images of other modalities from RGB images. For ease of description, the embodiment of the present application takes a depth image as an example for description. It is understandable that the embodiment of the present application may also generate other types of images, such as infrared images, for living body detection. The embodiment of the application does not limit this.
基于上述内容,图2示例性示出了本申请实施例提供的活体检测神经网络训练方法的流程示意。如图2所示,该方法包括:Based on the foregoing content, FIG. 2 exemplarily shows a schematic flow chart of a training method for a living body detection neural network provided by an embodiment of the present application. As shown in Figure 2, the method includes:
S202,获取第一图像和第一图像对应的第一深度图像。S202: Acquire a first image and a first depth image corresponding to the first image.
在训练活体检测模型时,需要先获取训练的数据。示例性的,本申请实施例中,获取的训练数据可以是第一图像和第一图像对应的第一深度图像。其中,第一图像为RGB图像。可以理解的是,训练数据可以包括多张第一图像和多张第一图像对应的第一深度图像。同时,第一图像的类型可以不限于RGB图像,第一图像对应的图像也可以是其他类型的图像,例如红外图像等。只要第一图像和第一图像对应的图像是不同类型的图像,即满足本申请实施例中训练数据的要求。为了方便描述,下面以第一图像为RGB图像,第一图像对应的图像为第一深度图像为例。When training the live detection model, the training data needs to be obtained first. Exemplarily, in the embodiment of the present application, the acquired training data may be the first image and the first depth image corresponding to the first image. Among them, the first image is an RGB image. It is understandable that the training data may include multiple first images and first depth images corresponding to the multiple first images. At the same time, the type of the first image may not be limited to an RGB image, and the image corresponding to the first image may also be another type of image, such as an infrared image. As long as the first image and the images corresponding to the first image are images of different types, the requirements of the training data in the embodiment of the present application are met. For the convenience of description, the following takes the first image as an RGB image and the image corresponding to the first image as the first depth image as an example.
示例性的,第一图像和第一图像对应的第一深度图像可以是对同一物体拍摄的图片。例如,使用双目摄像头拍摄用户A,同时生成第一图像和第一图像对应的第一深度图像。可以理解的是,第一图像和第一深度图像可以视为在同一角度对用户A拍摄的。Exemplarily, the first image and the first depth image corresponding to the first image may be pictures taken of the same object. For example, a binocular camera is used to photograph user A, and the first image and the first depth image corresponding to the first image are generated at the same time. It is understandable that the first image and the first depth image can be regarded as being taken of the user A at the same angle.
在另一些实施例中,第一深度图像也可以使用双目立体视觉的原理生成。例如,通过两个摄像头从不同角度同时获取周围景物的两幅图像,或者由单摄像头在不同时刻从不同角度获取周围景物的两幅图像,然后基于视差原理可以恢复物体的三维几何信息,从而得到深度图像。In other embodiments, the first depth image may also be generated using the principle of binocular stereo vision. For example, using two cameras to obtain two images of the surrounding scenery from different angles at the same time, or a single camera to obtain two images of the surrounding scenery from different angles at different times, and then based on the principle of parallax, the three-dimensional geometric information of the object can be restored, thereby obtaining Depth image.
在一些实施例中,第一图像和第一图像对应的第一深度图像可以是人像数据库中或者采集获取。本申请实施例对第一图像和第一深度图像的获取方式不做限定。In some embodiments, the first image and the first depth image corresponding to the first image may be in a portrait database or collected. The embodiment of the present application does not limit the acquisition method of the first image and the first depth image.
可以理解,在本申请实施例中,第一图像和第一深度图像都是对同一对象,例如同一张人脸的拍摄,区别只是图像的类型不同。It can be understood that, in the embodiment of the present application, the first image and the first depth image are both shots of the same object, for example, the same human face, and the difference is only the type of the image.
S204,生成第二深度图像。S204: Generate a second depth image.
获得第一图像后,可以通过活体检测神经网络生成第二深度图像。可以理解的是,第一深度图像是通过摄像头采取的原始图像。而第二深度图像是通过提取第一图像的特征,通过算法生成的。After the first image is obtained, the second depth image can be generated by the living body detection neural network. It is understandable that the first depth image is the original image taken by the camera. The second depth image is generated by algorithm by extracting the features of the first image.
本申请实施例可以使用活体检测神经网络中的深度图像生成网络生成第二深度图像。The embodiment of the present application may use the depth image generation network in the living body detection neural network to generate the second depth image.
示例性的,深度图像生成网路可以包括两个独立的卷积神经网络,通过由粗到细(coarse-to-fine,CTF)的方法来生成第二深度图像。其中,一个卷积神经网络可以用来提取第一图像的粗颗粒度特征,另一个卷积神经网络可以用来提取第一图像的细颗粒度特征。然后将第一图像的粗颗粒度特征和细颗粒度特征通过融合算法生成第二深度图像。Exemplarily, the depth image generation network may include two independent convolutional neural networks, and the second depth image is generated through a coarse-to-fine (CTF) method. Among them, one convolutional neural network can be used to extract the coarse-grained features of the first image, and the other convolutional neural network can be used to extract the fine-grained features of the first image. Then the coarse-grained features and fine-grained features of the first image are generated through a fusion algorithm to generate a second depth image.
进一步,这两个独立的卷积神经网络可以是轻量型卷积神经网络。轻量级卷积神经网络的优势在于在网络参数减少的同时,又不损失网络性能,可以在移动设备上使用。例如,轻量级卷积神经网络可以是特征网络(feather-net)。这种网络可以同时保证运算速度和精度。Further, these two independent convolutional neural networks can be lightweight convolutional neural networks. The advantage of lightweight convolutional neural networks is that they can be used on mobile devices while reducing network parameters without losing network performance. For example, the lightweight convolutional neural network may be a feature network (feather-net). This kind of network can guarantee operation speed and accuracy at the same time.
图3示出了本申请实施例提供的一种基于coarse-to-fine生成深度图像的方法。如图3所示,该方法包括:Fig. 3 shows a method for generating a depth image based on coarse-to-fine provided by an embodiment of the present application. As shown in Figure 3, the method includes:
S302,通过第一神经网络获取第一图像的粗颗粒度特征。S302: Obtain the coarse granularity feature of the first image through the first neural network.
具体来说,第一神经网络可以是轻量型卷积神经网络。通过将第一图像输入到第一神经网络,可以获取第一图像的粗颗粒度(coarse-grained)特征。Specifically, the first neural network may be a lightweight convolutional neural network. By inputting the first image to the first neural network, the coarse-grained feature of the first image can be obtained.
S304,通过第二神经网络获取第一图像的细颗粒度特征。S304: Acquire the fine granularity feature of the first image through the second neural network.
具体来说,第一神经网络可以是轻量型卷积神经网络。通过将第一图像输入到第一神经网络,可以获取第一图像的细颗粒度特征(fine-grained)特征。需要说明的是,粗颗粒度特征和细颗粒度特征是相对概念。例如,对于人脸图像来说,可以定义人脸轮廓特征为粗颗粒特征。可以定义人脸的局部特征,例如眉毛的特征,为细颗粒度特征。Specifically, the first neural network may be a lightweight convolutional neural network. By inputting the first image to the first neural network, the fine-grained feature of the first image can be obtained. It should be noted that the characteristics of coarse granularity and fine granularity are relative concepts. For example, for a face image, the contour feature of the face can be defined as a coarse-grained feature. It is possible to define the local features of the face, such as the feature of eyebrows, as fine-grained features.
可以理解的是,第一神经网络可以是深度神经网络,也可以是卷积神经网络。第二神经网络可以是深度神经网络,也可以是卷积神经网络。It is understandable that the first neural network may be a deep neural network or a convolutional neural network. The second neural network can be a deep neural network or a convolutional neural network.
在本申请实施例中,对步骤S302和S304的执行先后顺序不作限定,步骤S302可以在步骤S304之后执行,也可以在步骤S304之前执行,也可以与步骤S304同时执行。In the embodiment of the present application, the order of execution of steps S302 and S304 is not limited, and step S302 may be executed after step S304, may also be executed before step S304, or may be executed simultaneously with step S304.
S204,生成第二深度图像。S204: Generate a second depth image.
在获取到第一图像的粗颗粒度特征和细颗粒度特征后,可以融合这两种特征,生成第二深度图像。本申请实施例对融合的具体算法不做限定。After obtaining the coarse-grained features and fine-grained features of the first image, the two features can be merged to generate a second depth image. The embodiment of the present application does not limit the specific algorithm of fusion.
本申请实施例中,通过融合粗颗粒度特征和细颗粒度特征,生成了第二深度图像,有助于提高鲁棒性。In the embodiment of the present application, the second depth image is generated by fusing the coarse granularity feature and the fine granularity feature, which helps to improve the robustness.
S206,根据第一深度图像和第二深度图像获取第一损失值。S206: Acquire a first loss value according to the first depth image and the second depth image.
获取第二深度图像后,可以通过算法比较第二深度图像与第一深度图像的差别,确定第一损失值。After the second depth image is acquired, the difference between the second depth image and the first depth image can be compared through an algorithm to determine the first loss value.
例如,可以通过尺度不变(scale invariant)算法比较第二深度图像与第一深度图像的差别,确定尺度不变损失(scale invariant loss)。For example, a scale invariant algorithm can be used to compare the difference between the second depth image and the first depth image to determine the scale invariant loss.
第一损失值用来表示第二深度图与第一深度图像的差别大小。第一损失值越小,则表示第二深度图与第一深度图像的差别越小。活体检测神经网络通过不断的学习,所生成第二深度图像越来越准确,从而使第二深度图像越来越接近真实拍摄的第一深度图像。The first loss value is used to indicate the difference between the second depth image and the first depth image. The smaller the first loss value, the smaller the difference between the second depth image and the first depth image. Through continuous learning, the living body detection neural network generates the second depth image more and more accurate, so that the second depth image is getting closer and closer to the first depth image that is actually taken.
本申请实施例中,通过多个样本的反复训练,可以使第一损失值越来越小,从而使活体检测神经网络生成的第二深度图越来越接近真实拍摄的第一深度图。可以理解的是,该步骤是在训练活体检测神经网络时执行的。当电子设备在使用活体检测神经网络检测活体时,不需要对生成的深度图像进行损失计算。In the embodiment of the present application, through repeated training of multiple samples, the first loss value can be made smaller and smaller, so that the second depth map generated by the living body detection neural network is getting closer and closer to the first depth map that is actually shot. It is understandable that this step is performed when training the living body detection neural network. When the electronic device uses the living body detection neural network to detect the living body, there is no need to perform loss calculation on the generated depth image.
S208,融合第一图像和第二深度图像的特征。S208: Fusion features of the first image and the second depth image.
获取第二深度图像后,可以通过活体检测神经网络分别提取第一图像和第二深度图像的特征,然后将两个图像提取的特征融合,形成特征图。After the second depth image is obtained, the features of the first image and the second depth image can be extracted separately through the vitality detection neural network, and then the features extracted from the two images are merged to form a feature map.
示例性的,可以通过检测网络中的两个独立的特征提取网络分别提取第一图像和第二深度图像的特征。示例性的,这两个特征提取网络可以是相同且独立的两个骨干网络(backbone network)。骨干网络属于深度学习的一种模型,用来提取图像的特征,给出图像不同大小、不同抽象层次的表示。再例如,这两个特征提取网络也可以是轻量级卷积神经网络。Exemplarily, the features of the first image and the second depth image may be extracted by two independent feature extraction networks in the detection network. Exemplarily, the two feature extraction networks may be two identical and independent backbone networks. The backbone network is a model of deep learning, which is used to extract the features of the image and give the representation of different sizes and different abstract levels of the image. For another example, these two feature extraction networks can also be lightweight convolutional neural networks.
由于第一图像和第二深度图像属于不同模态的数据,该步骤也可以称为多模态(multi-modal)数据融合。其中,特征图除了包括特征值外,还包括相对位置的信息。例如,对于一幅人脸图像,眼睛,鼻子,嘴巴都是从上到下排列的,则提取粗的相应的特征值也是按照这个顺序排列。Since the first image and the second depth image belong to different modal data, this step can also be called multi-modal data fusion. Among them, the feature map not only includes feature values, but also includes relative position information. For example, for a face image, the eyes, nose, and mouth are all arranged from top to bottom, and the corresponding feature values extracted are also arranged in this order.
换而言之,可以将第一图像和第二深度图像分别输入到至少一个卷积层中,提取特征后再进行融合,最后形成特征图。In other words, the first image and the second depth image can be separately input into at least one convolutional layer, and the features are extracted and then fused, and finally a feature map is formed.
其中,融合后的特征可以存储在数据容器张量(tensor)中,用来做进一步的学习。Tensor是一种数据容器,用来存储数据。例如,对于一个RGB图像来说,可以处理成一个3D的tensor,每个二维的列有三个元素,分别表示一个像素的红、黄、蓝的数值。Among them, the fused features can be stored in the data container tensor for further learning. Tensor is a data container used to store data. For example, for an RGB image, it can be processed into a 3D tensor, and each two-dimensional column has three elements, which represent the red, yellow, and blue values of a pixel.
特征融合可以通过多种融合算法实现。例如,特征融合算法可以包括基于贝叶斯决策理论的算法,基于稀疏表示理论的算法或者基于深度学习理论的算法。本申请实施例对特征融合实现的具体算法不做限定。Feature fusion can be achieved through a variety of fusion algorithms. For example, the feature fusion algorithm may include an algorithm based on Bayesian decision theory, an algorithm based on sparse representation theory, or an algorithm based on deep learning theory. The embodiment of the present application does not limit the specific algorithm implemented by feature fusion.
S210,基于全局特征获取第二损失值。S210: Acquire a second loss value based on the global feature.
全局特征(global feature)指图像的整体属性。例如,全局特征可以是颜色特征、纹理特征、形状特征等。全局特征容易受到外界环境的干扰。Global feature (global feature) refers to the overall attributes of an image. For example, the global features can be color features, texture features, shape features, and so on. Global features are easily disturbed by the external environment.
具体来说,在得到融合的特征图后,可以输入到检测网络中处理。例如,通过检测网络将特征图进行全局池化(global pooling),得到全局特征。然后将全局特征进一步输入到全连接层进行活体检测。Specifically, after the fused feature map is obtained, it can be input into the detection network for processing. For example, global pooling is performed on the feature map through the detection network to obtain global features. Then the global features are further input to the fully connected layer for live detection.
为了训练活体检测神经网络,可以在检测网络上接入损失函数层中,通过损失函数层得到概率向量。根据概率向量和第一图像的标签确定第二损失值。In order to train the living body detection neural network, the loss function layer can be connected to the detection network, and the probability vector can be obtained through the loss function layer. The second loss value is determined according to the probability vector and the label of the first image.
在一些实施例中,参考图1e所示,全局特征经过全连接层处理后,可以进一步输入损失函数层中,进行分类,最后得到归属每个特征的概率。示例性的,损失函数可以是softmax函数。由于输入的第一图像和第一深度图像可以事前打上标签进行分类。因此可以比较softmax函数输出的概率与标签的差别,得到损失(loss)。活体检测模型训练的过程就是找到最优的模型参数,从而使损失最小。In some embodiments, referring to FIG. 1e, after the global features are processed by the fully connected layer, they can be further input into the loss function layer for classification, and finally the probability of belonging to each feature is obtained. Exemplarily, the loss function may be a softmax function. Because the input first image and the first depth image can be labeled in advance for classification. Therefore, the difference between the probability output by the softmax function and the label can be compared to obtain the loss. The process of living body detection model training is to find the optimal model parameters so as to minimize the loss.
示例性的,下面以一个具体例子解释如何根据softmax输出的概率和标签得到损失。Exemplarily, the following uses a specific example to explain how to get the loss based on the probability and label output by softmax.
假设有一组训练图像,带有三种标签,分别为猫、鸭、鸡。则标签向量可以分别是猫[1,0,0],鸭[0,1,0],鸡[0,0,1]。图像A是其中一个训练图像,其标签 向量为猫[1,0,0]。将图像A带入到活体检测神经网络后,softmax函数输出的概率向量是[0.65,0.05,0.3]。此时通过比较概率向量与标签向量,可以得到一个损失值。例如,损失(loss)可以是-log(0.65)。Suppose there is a set of training images with three kinds of labels, namely cat, duck, and chicken. Then the label vector can be cat[1,0,0], duck[0,1,0], chicken[0,0,1]. Image A is one of the training images, and its label vector is cat[1,0,0]. After bringing the image A into the live detection neural network, the probability vector output by the softmax function is [0.65, 0.05, 0.3]. At this time, by comparing the probability vector with the label vector, a loss value can be obtained. For example, the loss can be -log(0.65).
在一些实施例中,用来训练活体检测神经网络的图像的标签为活体和非活体。活体的标签向量为[1,0],非活体的标签向量为[0,1]。假设第一图像的标签为活体[1,0],将第一图像输入到活体检测神经网络后,损失函数层输出的概率向量为[0.7,0.3]。可以通过算法比较概率向量和标签向量的差值,确定第二损失值。例如第二损失值可以是0.3。In some embodiments, the labels of the images used to train the living body detection neural network are living body and non-living body. The label vector of the living body is [1,0], and the label vector of the non-living body is [0,1]. Assuming that the label of the first image is a living body [1,0], after the first image is input to the living body detection neural network, the probability vector output by the loss function layer is [0.7, 0.3]. The second loss value can be determined by comparing the difference between the probability vector and the label vector through an algorithm. For example, the second loss value may be 0.3.
可以理解的是,该例子仅仅是示例性的,第一图像标签可以有很多种,本申请实施例对此不做限定。It is understandable that this example is only illustrative, and there may be many types of first image tags, which are not limited in the embodiment of the present application.
由于第二损失值是根据全局特征确定的,第二损失值又可以称为全局学习损失(global learning loss),用于比较全局特征与真实值的区别。Since the second loss value is determined according to the global feature, the second loss value may also be called a global learning loss (global learning loss), which is used to compare the difference between the global feature and the real value.
可以理解的是,活体检测神经网络中的检测网络在训练完成后,可以去除损失函数层,只需要输出是否为活体的判断。It is understandable that the detection network in the living body detection neural network can remove the loss function layer after the training is completed, and only needs to output the judgment of whether it is a living body.
S212,基于局部特征获取第三损失值。S212: Acquire a third loss value based on the local feature.
局部特征(local feature)指从图像局部区域中抽取的特征。局部特征之间的相关度较小。例如,对于人脸图像来说,局部特征可以分别是眼部、鼻部、嘴部特征。局部特征可以体现图像局部的细微差别,不容易受到外界环境的干扰。根据特征图确定局部特征后,可以将局部特征处理后与真实值或者真实标签对比,确定第三损失值。Local feature (local feature) refers to the feature extracted from the local area of the image. The correlation between local features is small. For example, for a human face image, the local features can be eye, nose, and mouth features. The local features can reflect the nuances of the image, and it is not easy to be disturbed by the external environment. After the local feature is determined according to the feature map, the processed local feature can be compared with the real value or the real label to determine the third loss value.
在训练活体检测神经网络时,会对S208步骤中生成的融合特征进行分块强化学习,即提取局部特征后,进行局部特征训练,以获取更好的活体检测性能。When training the living body detection neural network, the fusion features generated in step S208 are subjected to block reinforcement learning, that is, after local features are extracted, local feature training is performed to obtain better living body detection performance.
图4示出了本申请实施例提供的一种局部特征训练示意图。FIG. 4 shows a schematic diagram of a local feature training provided by an embodiment of the present application.
如图4所示,在获得特征图后,可以将特征图局部分块。示例性的,可以将特征图分为第一部分401,第二部分402和第三部分403。As shown in Figure 4, after the feature map is obtained, the feature map can be partially divided. Exemplarily, the feature map can be divided into a first part 401, a second part 402, and a third part 403.
将第一部分401,第二部分402和第三部分403分别进行池化,卷积,然后输入到全连接层中,最后通过损失函数确定第三损失。本申请实施例对池化,卷积,全连接和损失的具体确定方式不做限定。示例性的,卷积过程可以采用1*1的卷积模式。示例性的,局部特征经过损失函数处理后,输出的概率向量用来表征该特征是否为活体的概率。然后将概率向量与标签向量对比,得到第三损失值。The first part 401, the second part 402 and the third part 403 are pooled, convolved, and then input to the fully connected layer, and finally the third loss is determined by the loss function. The embodiment of the present application does not limit the specific determination methods of pooling, convolution, full connection, and loss. Exemplarily, the convolution process may adopt a 1*1 convolution mode. Exemplarily, after the local feature is processed by the loss function, the output probability vector is used to characterize the probability of whether the feature is a living body. Then, the probability vector is compared with the label vector to obtain the third loss value.
示例性,第一部分401可以是眼部特征,第二部分402可以是鼻部特征,第三部分403可以是嘴部特征。这三个部分输入到损失函数中处理,分别输出活体的概率。假设标签为活体和非活体。眼部特征经过损失函数层输出的概率向量为[0.5,0.5],鼻部特征相关的概率向量为[0.6,0.4],嘴部特征相关的概率向量为[0.7,0.3]。由于标签只有活体和非活体,如果第一图像的标签是活体标签[1,0],即可以认为眼部,鼻部,嘴部特征所对应的标签向量也为[1,0]。然后可以根据概率向量和标签向量确定第三损失值包括0.5,0.4和0.3。Exemplarily, the first part 401 may be an eye feature, the second part 402 may be a nose feature, and the third part 403 may be a mouth feature. These three parts are input into the loss function for processing, and the probabilities of the living body are output respectively. Assume that the labels are live and non-living. The probability vector output by the eye feature through the loss function layer is [0.5, 0.5], the probability vector related to the nose feature is [0.6, 0.4], and the probability vector related to the mouth feature is [0.7, 0.3]. Since there are only living and non-living labels, if the label of the first image is a living label [1,0], it can be considered that the label vector corresponding to the eye, nose, and mouth features is also [1,0]. Then, it can be determined that the third loss value includes 0.5, 0.4, and 0.3 according to the probability vector and the label vector.
在本申请实施例中,对步骤S302和S304的执行先后顺序不作限定,步骤 S302可以在步骤S304之后执行,也可以在步骤S304之前执行,也可以与步骤S304同时执行。In the embodiment of the present application, the order of execution of steps S302 and S304 is not limited, and step S302 may be executed after step S304, may also be executed before step S304, or may be executed simultaneously with step S304.
可以理解的是,在活体检测神经网络在训练完成后,可以删除局部特征学习的相关部分。换而言之,局部特征学习是为了训练活体检测神经网络,包括深度图像生成网络和检测网络。It is understandable that after the vitality detection neural network is trained, the relevant part of the local feature learning can be deleted. In other words, local feature learning is to train the living body detection neural network, including the deep image generation network and the detection network.
在获取第一损失值、第二损失值和第三损失值后,可以通过一些优化算法对活体检测模型进行训练,即迭代学习,使得损失值尽可能最小。例如,可以通过动量随机梯度下降法(Stochastic gradient descent,SGD),迭代调整活体检测模型中的参数,使得每次计算得出的损失值越来越少。具体来说,就是通过调整卷积层中卷积核的权重,使得损失值越来越小。After obtaining the first loss value, the second loss value, and the third loss value, some optimization algorithms can be used to train the living body detection model, that is, iterative learning, to minimize the loss value as much as possible. For example, the Stochastic Gradient Descent (SGD) method can be used to iteratively adjust the parameters in the living body detection model, so that the loss value calculated each time becomes less and less. Specifically, by adjusting the weight of the convolution kernel in the convolution layer, the loss value becomes smaller and smaller.
当第一损失值,第二损失值和第三损失值满足一定阈值后,该活体检测模型可以部署在电子设备上用来进行活体检测。When the first loss value, the second loss value, and the third loss value meet certain thresholds, the living body detection model can be deployed on an electronic device for living body detection.
本申请提供的活体检测神经网络训练方法,联合局部特征和全局特征一起学习,可以提高活体检测神经网络的鲁棒性The living body detection neural network training method provided in this application combines local features and global features to learn together, which can improve the robustness of the living body detection neural network
本申请实施例中,当活体检测神经网络训练好后,可以应用在电子设备中,用于活体检测。例如,图1b所示的电子设备101可以使用本申请实施例所提供的活体检测神经网络。具体的,当用户通过电子设备101与其他用户开启视频通话时,电子设备101可以开启摄像头105,通过活体检测神经网络确定摄像头拍摄的图像当中的人脸是否为活体。再例如,在手势识别的过程中,电子设备可以通过活体检测神经网络,判断做出手势的用户是否为活体。再例如,电子设备在判断是否需要切换到儿童模式时,可以通过摄像头拍摄的图像,判断图像中人脸是属于儿童的活体人脸,还是墙壁上的儿童照片。In the embodiments of the present application, after the living body detection neural network is trained, it can be applied to electronic equipment for living body detection. For example, the electronic device 101 shown in FIG. 1b may use the living body detection neural network provided by the embodiment of the present application. Specifically, when a user initiates a video call with other users through the electronic device 101, the electronic device 101 can turn on the camera 105, and determine whether the human face in the image captured by the camera is a living body through the living body detection neural network. For another example, in the process of gesture recognition, the electronic device can determine whether the user making the gesture is a living body through the living body detection neural network. For another example, when the electronic device determines whether to switch to the child mode, it can determine whether the human face in the image belongs to a child's live human face or a child's photo on the wall through the image taken by the camera.
活体检测神经网络可以预先设置在电子设备中,也可以通过服务器下载到电子设备中。本申请实施例对活体检测模型的具体部署方式不做限定。The living body detection neural network can be pre-set in the electronic device, or it can be downloaded to the electronic device through the server. The embodiment of the present application does not limit the specific deployment mode of the living body detection model.
图5示出了本申请实施例提供的一种应用活体检测模型的流程图。该方法可以应用于上文介绍的电子设备中,也可以应用于上文没有介绍的电子设备中。为方便描述,下面以图1b所示的电子设备101为例进行描述。Fig. 5 shows a flow chart of applying a living body detection model provided by an embodiment of the present application. This method can be applied to the electronic equipment introduced above, and can also be applied to the electronic equipment not introduced above. For the convenience of description, the electronic device 101 shown in FIG. 1b is taken as an example for description.
如图5所示,所述方法具体步骤包括:As shown in Figure 5, the specific steps of the method include:
S502,获取当前图像。S502: Acquire a current image.
参考图1b,电子设备101获取当前图像。其中,当前图像可以是电子设备101通过摄像头105拍摄的图像,也可以是电子设备101通过其他电子设备获取的图像。Referring to FIG. 1b, the electronic device 101 acquires a current image. The current image may be an image captured by the electronic device 101 through the camera 105, or an image captured by the electronic device 101 through another electronic device.
示例性的,当电子设备101需要启动人像追踪或者人脸追踪功能时,可以开始获取当前图像步骤。例如,当用户通过电子设备101开始与其他用户进行视频通过时,或者开始使用手势操作时,电子设备101可以开始通过摄像头105获取当前图像。示例性的,当前图像可以是RGB图像。Exemplarily, when the electronic device 101 needs to activate the portrait tracking or face tracking function, it may start the step of acquiring the current image. For example, when the user starts to perform video communication with other users through the electronic device 101, or starts to use gesture operations, the electronic device 101 may start to acquire the current image through the camera 105. Exemplarily, the current image may be an RGB image.
当前图像可以是摄像头单独拍摄的一张图像,也可以是摄像头持续拍摄视频中的当前帧或者某一帧图像。The current image can be an image taken by the camera alone, or it can be the current frame or a certain frame of image in the video that the camera continuously takes.
电子设备可以直接从摄像头获取当前图像,也可以通过其他设备的摄像头获取当前图像。例如,电子设备与一个独立的摄像头通过物理或者无线的方式连接,从该独立的摄像头获取当前图像。或者,电子设备也可以通过其他电子设备的摄像头获取当前图像。或者,电子设备也可以通过云端服务器获取当前图像。例如,门禁的摄像头拍摄到当前图像后,传到云端服务器,然后云端服务器再将当前图像发给电子设备进行识别。The electronic device can obtain the current image directly from the camera, or obtain the current image through the camera of another device. For example, the electronic device is physically or wirelessly connected to an independent camera, and the current image is obtained from the independent camera. Alternatively, the electronic device can also obtain the current image through the camera of another electronic device. Alternatively, the electronic device may also obtain the current image through the cloud server. For example, after the camera of an entrance guard captures the current image, it is transmitted to the cloud server, and then the cloud server sends the current image to the electronic device for identification.
可以理解的是,本申请实施例对获取的图像是否为当前拍摄的不做限定。It is understandable that the embodiment of the present application does not limit whether the acquired image is the currently captured image.
S504,对当前图像进行人脸检测(face detection),确定至少一张人脸。S504: Perform face detection on the current image, and determine at least one face.
电子设备101获取到当前图像后,可以根据人脸检测算法判断当前图像中是否包含人脸。人脸检测是一种在任意图像中找到人脸的位置和大小的技术,可以检测出面部特征,并忽略例如建筑物,树木和身体等其他东西。After the electronic device 101 obtains the current image, it can determine whether the current image contains a human face according to a face detection algorithm. Face detection is a technology that finds the position and size of a human face in any image. It can detect facial features and ignore other things such as buildings, trees, and bodies.
同时,电子设备101也可以根据人脸检测算法确定当前图像包含了多少张人脸。例如,电子设备101可以根据人脸识别算法确定当前图像中包括三张人脸,并确定这三张人脸的在图像中的具体位置。换而言之,电子设备101可以获取当前图像中至少一张人脸的数据。At the same time, the electronic device 101 can also determine how many faces the current image contains according to the face detection algorithm. For example, the electronic device 101 may determine that three human faces are included in the current image according to a face recognition algorithm, and determine the specific positions of the three human faces in the image. In other words, the electronic device 101 can obtain data of at least one face in the current image.
如果判断当前图像不包含人脸,则可以返回执行步骤S502,继续获取图像。If it is determined that the current image does not contain a human face, then it can return to step S502 to continue to acquire the image.
如果判断当前图像包含人脸,则可以继续执行步骤S506。If it is determined that the current image contains a human face, step S506 can be continued.
人脸检测可以由多种方式实现。例如,可以基于几何特征、模板或者模型识别人脸。本申请实施例对人脸检测的具体算法不做限定。Face detection can be implemented in many ways. For example, human faces can be recognized based on geometric features, templates, or models. The embodiment of the present application does not limit the specific algorithm of face detection.
S506,对齐至少一张人脸。S506: Align at least one human face.
检测到人脸后,可以对齐人脸。人脸对齐(facial alignment)即根据输入的人脸图像,定位出面部关键特征点,如眼睛、鼻尖等。例如,可以利用人脸特征检测(face feature detection)识别人脸不同特征的位置。具体来说,在检测到人脸后,可以检测人脸的标记(landmark)。Landmark用来标记人脸的各种关键位置,例如人脸的边,角,轮廓等。Landmark用来描述人脸的形态。After detecting the face, you can align the face. Facial alignment refers to locating key facial feature points, such as eyes, nose tip, etc., according to the input face image. For example, face feature detection (face feature detection) can be used to identify the positions of different features of a human face. Specifically, after the human face is detected, the landmark of the human face can be detected. Landmark is used to mark various key positions of the face, such as the sides, corners, contours, etc. of the face. Landmark is used to describe the shape of a human face.
在一些实施例中,可以在检测到的至少一张人脸上进行landmarks检测后,获得一系列的landmark点。利用检测到的landmarks和模板的landmarks,计算仿射矩阵H,然后利用H直接计算得到对齐的人脸。In some embodiments, a series of landmark points may be obtained after landmark detection is performed on at least one detected face. Use the detected landmarks and the template's landmarks to calculate the affine matrix H, and then use H to directly calculate the aligned faces.
该步骤为可选步骤。电子设备可以在检测出人脸后,直接将该人脸所在的图像输入到活体检测模型中进行活体识别。This step is optional. After detecting the human face, the electronic device can directly input the image of the human face into the living body detection model for living body recognition.
S508,对至少一张人脸进行预处理。S508: Perform preprocessing on at least one face.
由于通过人脸检测和人脸对齐获取的人脸图像,大小角度可能不一致。而活体检测模型往往需要输入的图像具有统一的尺寸或者大小。此时需要根据活体检测模型对输入图像的需求,对至少一张人脸进行预处理。Due to the face images obtained through face detection and face alignment, the size and angle may be inconsistent. The live detection model often requires the input image to have a uniform size or size. At this time, it is necessary to preprocess at least one face according to the requirements of the live detection model for the input image.
示例性的,预处理可以包括对人脸图像去噪,剪切(crop),调整图像大小(resize),姿态旋转(rotate)等。Exemplarily, the preprocessing may include denoising the face image, cropping, resizing the image, rotating the posture, and so on.
该步骤为可选步骤。可以理解的是,本申请实施例对预处理的具体方式不做限定。This step is optional. It can be understood that the embodiment of the present application does not limit the specific manner of preprocessing.
S510,根据活体检测神经网络确定活体检测结果。S510: Determine a living body detection result according to the living body detection neural network.
对至少一张人脸进行预处理后,可以将经过预处理的至少一张人脸图像输入到活体检测神经网络中,进行活体检测,然后确定活体检测结果。活体检测结果用于指示输入的人脸图像中的人脸是否为活体。After the at least one face is preprocessed, the preprocessed at least one face image can be input into the living body detection neural network for living body detection, and then the living body detection result is determined. The living body detection result is used to indicate whether the face in the input face image is a living body.
例如,假如当前图像中有3个人脸,电子设备经过识别后将这3个人脸图像分别输入到活体检测神经网络中,确定这3个人脸图像中的人脸是否为活体。For example, if there are 3 human faces in the current image, the electronic device inputs the 3 human face images into the living body detection neural network after recognition to determine whether the human faces in the 3 human face images are living bodies.
若存在人脸图像被识别为活体,则执行步骤S512。If there is a human face image that is recognized as a living body, step S512 is executed.
若当前图像中的人脸不存在活体人脸,则返回执行步骤S502,继续获取当前图像。If the human face in the current image does not have a living human face, return to step S502 to continue acquiring the current image.
S512,根据活体检测结果执行动作。S512: Perform an action according to the result of the living body detection.
该步骤为可选步骤。电子设备获取到活体检测结果后,可以根据活体检测结果执行相应的动作。This step is optional. After the electronic device obtains the living body detection result, it can perform corresponding actions according to the living body detection result.
例如,参考图1b。用户使用电子设备101跟另一个用户进行视频通话。电子设备101通过摄像头105获取当前图像,若活体检测结果显示当前图像中的某个人脸为活体,则电子设备可以启动人脸追踪算法,追踪该活体人脸,该人像调整在视频画面的最中央。例如图1b中的人脸106可以放置在画面的最中间。若没有活体检测结果显示存在活体,则电子设备可以继续进行活体检测,不调整画面。或者,电子设备可以调整摄像头的角度,重新获取图像进行活体检测。For example, refer to Figure 1b. The user uses the electronic device 101 to make a video call with another user. The electronic device 101 obtains the current image through the camera 105. If the live detection result shows that a face in the current image is alive, the electronic device can start a face tracking algorithm to track the live face, and the portrait is adjusted in the center of the video screen. . For example, the face 106 in FIG. 1b can be placed in the middle of the picture. If there is no living body detection result showing that there is a living body, the electronic device can continue the living body detection without adjusting the screen. Or, the electronic device can adjust the angle of the camera and reacquire images for live detection.
在一些实施例中,电子设备101在视频通话过程可以持续进行活体检测,将非活体过滤掉。例如,用户在通过过程中,举起手机给对方看第三人的照片,此时电子设备101可以通过活体检测确定该照片的头像不是活体,不会将该照片调整至画面的中间。In some embodiments, the electronic device 101 may continue to perform live body detection during the video call to filter out non-living bodies. For example, during the passage, the user raises the mobile phone to show the other person a photo of the third person. At this time, the electronic device 101 may determine that the avatar of the photo is not alive through the live body detection, and will not adjust the photo to the middle of the screen.
在另一些实施例中,电子设备101确定活体人脸后,可以使用人脸追踪算法跟踪该活体人脸,不再进行活体检测。当该活体人脸消失,再重新开始活体检测。In some other embodiments, after the electronic device 101 determines the face of the living body, it can use the face tracking algorithm to track the face of the living body, and no more living body detection is performed. When the face of the living body disappears, the living body detection is restarted.
在另一些实施例中,如果活体检测结果显示存在活体人脸,则电子设备可以进一步判断该活体人脸是否为儿童。如果电子设备判断该活体人脸为儿童,则启动儿童模式。如果活体检测结果显示不存在活体人脸,则电子设备可以继续显示当前的界面,或者尽快进入待机状态,以减少功耗。In other embodiments, if the living body detection result shows that there is a living human face, the electronic device can further determine whether the living human face is a child. If the electronic device determines that the living human face is a child, the child mode is activated. If the living body detection result shows that there is no living body face, the electronic device can continue to display the current interface or enter the standby state as soon as possible to reduce power consumption.
本申请实施例对根据活体检测结果执行的动作不做限定,电子设备可以根据活体检测结果执行任何动作。The embodiments of the present application do not limit the actions performed according to the results of the living body detection, and the electronic device can perform any actions according to the results of the living body detection.
需要说明的是,跟踪活体人脸只是本申请实施例所提供的活体检测神经网络的其中一个应用情况。任何需要进行活体检测的场景,都有可能应用到本申请实施例所提供的。例如,用户使用电子设备进行支付,儿童模式,手势控制等。本申请实施例对活体检测神经网络的应用场景不做限定。It should be noted that tracking the face of a living body is only one of the application situations of the living body detection neural network provided in the embodiment of the present application. Any scene that requires living body detection may be applied to what is provided in the embodiment of the present application. For example, users use electronic devices to make payments, child mode, gesture control, etc. The embodiments of the present application do not limit the application scenarios of the living body detection neural network.
图6示出了本申请实施例提供的另一种活体检测方法的流程图。该活体检测方法可以应用于上文介绍的电子设备中,也可以应用于上文没有介绍的电子设备中,也可以运行在云端服务器中。本申请对电子设备的具体类型不做限定,只要 具备运算能力即可。Fig. 6 shows a flowchart of another living body detection method provided by an embodiment of the present application. The living body detection method can be applied to the electronic equipment introduced above, can also be applied to electronic equipment not introduced above, or run in a cloud server. This application does not limit the specific types of electronic devices, as long as they have computing capabilities.
S602,获取目标对象的第二图像。S602: Acquire a second image of the target object.
获取目标对象的第二图像。其中,目标对象指需要进行活体检测的人脸。第二图像可以是RGB图像。Obtain a second image of the target object. Among them, the target object refers to the face that needs to be detected in vivo. The second image may be an RGB image.
第二图像可以直接从摄像头获取,也可以是经过处理的图像。例如,第二图像可以通过图5所示的步骤S502-S508的方式获取。The second image can be obtained directly from the camera, or it can be a processed image. For example, the second image can be obtained in the manner of steps S502-S508 shown in FIG. 5.
本申请实施例对第二图像的具体获取方式不做限定。The embodiment of the present application does not limit the specific acquisition method of the second image.
S604,根据第二图像生成第三深度图像。S604: Generate a third depth image according to the second image.
将第二图像输入到活体检测神经网络中。电子设备可以根据活体检测神经网络生成第二图像对应的第三深度图像。具体来说,可以根据活体检测神经网络中的深度图像生成网络生成第二图像对应的第三深度图像。The second image is input into the living body detection neural network. The electronic device may generate a third depth image corresponding to the second image according to the living body detection neural network. Specifically, the third depth image corresponding to the second image can be generated according to the depth image generation network in the living body detection neural network.
具体生成方式参考图2所示的步骤S204和图3所示的coarse-to-fine方法。For the specific generation method, refer to step S204 shown in FIG. 2 and the coarse-to-fine method shown in FIG. 3.
S606,根据第二图像和第三深度图像获取全局特征。S606: Acquire a global feature according to the second image and the third depth image.
获取第三深度图像后,可以根据活体检测神经网络中的检测网络提取第二图像和第三深度图像的特征,然后将两张图像的特征进行融合,形成融合特征图。其中检测网络可以包括两个独立的骨干网络,分别提取第二图像和第三深度图像的特征。After the third depth image is obtained, the features of the second image and the third depth image can be extracted according to the detection network in the living body detection neural network, and then the features of the two images are fused to form a fused feature map. The detection network may include two independent backbone networks to extract the features of the second image and the third depth image respectively.
具体过程可以参考图2所示的步骤S208。For the specific process, refer to step S208 shown in FIG. 2.
获取融合特征图后,可以进一步通过池化等方式,输出全局特征。After the fusion feature map is obtained, the global features can be further output by means such as pooling.
S608,基于全局特征确定活体检测结果。S608: Determine a living body detection result based on the global feature.
获取全局特征后,可以根据检测网络确定活体检测结果。活体检测结果用来指示目标对象是否为活体。具体确定方式可以参考图2所示的步骤S210。区别在于,此时的检测网络直接输出目标对象是否为活体的判断,而不需要再通过损失函数层处理,输出概率向量。After obtaining the global features, the living body detection result can be determined according to the detection network. The live detection result is used to indicate whether the target object is alive. For a specific determination method, refer to step S210 shown in FIG. 2. The difference is that the detection network at this time directly outputs the judgment of whether the target object is a living body, and does not need to process through the loss function layer to output the probability vector.
换而言之,活体检测神经网络中的检测网络可以根据第二图像和第二图像对应的第三深度图像,确定第二图像中的目标对象是否为活体。In other words, the detection network in the living body detection neural network can determine whether the target object in the second image is a living body according to the second image and the third depth image corresponding to the second image.
本申请实施例所提供的活体检测方法,不需要额外硬件获取多模态数据,直接可以从RGB图像中生成深度图像,并将深度图像和RGB图像融合,根据融合的特征进行活体检测。这种检测方式既保证了活体检测的精度,又需要额外的成本。同时,本申请实施例提供的活体检测方法在生成深度图像过程中可以使用轻量级卷积神经网,大大减轻了计算量。The living body detection method provided by the embodiments of the present application does not require additional hardware to obtain multi-modal data, and can directly generate a depth image from an RGB image, merge the depth image and the RGB image, and perform living body detection according to the fused features. This detection method not only guarantees the accuracy of living body detection, but also requires additional costs. At the same time, the living body detection method provided by the embodiments of the present application can use a lightweight convolutional neural network in the process of generating a depth image, which greatly reduces the amount of calculation.
根据前述方法,图7为本申请实施例提供的服务器的结构示意图,如图7所示,该服务器可以集成在电子设备上,也可以放置在云端,也可以为芯片或电路,比如可设置于电子设备的芯片或电路,再比如可设置于云端计算平台内的芯片或电路。According to the foregoing method, FIG. 7 is a schematic diagram of the structure of the server provided by the embodiment of the application. As shown in FIG. The chip or circuit of an electronic device, for example, a chip or circuit that can be set in a cloud computing platform.
进一步的,该服务器1301还可以进一步包括总线系统,其中,处理器1302、存储器1304、通信接口1303可以通过总线系统相连。Further, the server 1301 may further include a bus system, wherein the processor 1302, the memory 1304, and the communication interface 1303 may be connected through the bus system.
应理解,上述处理器1302可以是一个芯片。例如,该处理器1302可以是现场可编程门阵列(field programmable gate array,FPGA),可以是专用集成芯片(application specific integrated circuit,ASIC),还可以是系统芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)或其他集成芯片。It should be understood that the aforementioned processor 1302 may be a chip. For example, the processor 1302 may be a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), or a system on chip (SoC). It can be a central processor unit (CPU), a network processor (NP), a digital signal processing circuit (digital signal processor, DSP), or a microcontroller (microcontroller). unit, MCU), and may also be a programmable logic device (PLD) or other integrated chips.
在实现过程中,上述方法的各步骤可以通过处理器1302中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器1302中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1304,处理器1302读取存储器1304中的信息,结合其硬件完成上述方法的步骤。In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1302 or instructions in the form of software. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor 1302. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory 1304, and the processor 1302 reads the information in the memory 1304, and completes the steps of the foregoing method in combination with its hardware.
应注意,本申请实施例中的处理器1302可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It should be noted that the processor 1302 in the embodiment of the present application may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method embodiments may be completed by hardware integrated logic circuits in the processor or instructions in the form of software. The above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components . The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
可以理解,本申请实施例中的存储器1304可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus  RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 1304 in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), and synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) ) And direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memories of the systems and methods described herein are intended to include, but are not limited to, these and any other suitable types of memories.
该服务器1301对应上述方法中的服务器的情况下,该服务器1301可以包括处理器1302、通信接口1303和存储器1304。该存储器1304用于存储指令,该处理器1302用于执行该存储器1304存储的指令,以实现如上任一实施例对应的方法中服务器的相关方案。When the server 1301 corresponds to the server in the foregoing method, the server 1301 may include a processor 1302, a communication interface 1303, and a memory 1304. The memory 1304 is used to store instructions, and the processor 1302 is used to execute the instructions stored in the memory 1304, so as to implement the related solution of the server in the method corresponding to any one of the above embodiments.
该服务器所涉及的与本申请实施例提供的技术方案相关的概念,解释和详细说明及其他步骤请参见前述方法或其他实施例中关于这些内容的描述,此处不做赘述。For the concepts related to the technical solutions provided by the embodiments of the present application, explanations, detailed descriptions, and other steps involved in the server, please refer to the descriptions of these contents in the foregoing methods or other embodiments, which are not repeated here.
基于以上实施例以及相同构思,图8为本申请实施例提供的服务器的示意图,如图8所示,该服务器1501可以为服务器,也可以为芯片或电路,比如可设置于服务器的芯片或电路。Based on the above embodiments and the same concept, FIG. 8 is a schematic diagram of a server provided by an embodiment of the application. As shown in FIG. 8, the server 1501 may be a server, or a chip or circuit, such as a chip or circuit that can be installed in the server. .
该服务器所涉及的与本申请实施例提供的技术方案相关的概念,解释和详细说明及其他步骤请参见前述方法或其他实施例中关于这些内容的描述,此处不做赘述。For the concepts related to the technical solutions provided by the embodiments of the present application, explanations, detailed descriptions, and other steps involved in the server, please refer to the descriptions of these contents in the foregoing methods or other embodiments, which are not repeated here.
可以理解的是,上述服务器1501中各个单元的功能可以参考相应方法实施例的实现,此处不再赘述。It can be understood that the functions of the various units in the server 1501 can be implemented with reference to the corresponding method embodiments, which will not be repeated here.
应理解,以上服务器的单元的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。本申请实施例中,收发单元1503可以由上述图7的通信接口1303实现,处理单元1502可以由上述图7的处理器1302实现。It should be understood that the division of the above server units is only a division of logical functions, and may be fully or partially integrated into one physical entity during actual implementation, or may be physically separated. In the embodiment of the present application, the transceiver unit 1503 may be implemented by the communication interface 1303 in FIG. 7 described above, and the processing unit 1502 may be implemented by the processor 1302 in FIG. 7 described above.
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行图2至图6所示实施例中任意一个实施例的方法。According to the method provided in the embodiments of the present application, the present application also provides a computer program product. The computer program product includes: computer program code. When the computer program code runs on a computer, the computer executes the steps shown in FIGS. 2 to 6. The method of any one of the embodiments is shown.
根据本申请实施例提供的方法,本申请实施例还提供一种计算机可读存储介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行图2至图6所示实施例中任意一个实施例的方法。According to the method provided in the embodiment of the present application, the embodiment of the present application also provides a computer-readable storage medium, the computer-readable medium stores a program code, and when the program code runs on a computer, the computer executes FIGS. 2 to The method of any one of the embodiments shown in FIG. 6.
根据本申请实施例提供的方法,本申请实施例还提供一种电子设备,其包括前述的服务器。According to the method provided in the embodiment of the present application, the embodiment of the present application also provides an electronic device, which includes the aforementioned server.
根据本申请实施例提供的方法,本申请实施例还提供一种系统,其包括前述的操作设备、一个或多个认证设备,以及前述的服务器。According to the method provided in the embodiment of the present application, the embodiment of the present application also provides a system, which includes the aforementioned operating device, one or more authentication devices, and the aforementioned server.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机指令时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心 通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disc,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented by software, it can be implemented in the form of a computer program product in whole or in part. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application are generated in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. Computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, computer instructions may be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (digital subscriber line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media. The usable medium can be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (for example, a solid state disk (solid state disc, SSD)) )Wait.
上述各个装置实施例中服务器与服务器和方法实施例中的服务器或服务器对应,由相应的模块或单元执行相应的步骤,例如通信单元(收发器)执行方法实施例中接收或发送的步骤,除发送、接收外的其它步骤可以由处理单元(处理器)执行。具体单元的功能可以参考相应的方法实施例。其中,处理器可以为一个或多个。The server in the foregoing device embodiments corresponds to the server or the server in the method embodiment, and the corresponding module or unit executes the corresponding steps. For example, the communication unit (transceiver) executes the steps of receiving or sending in the method embodiment, except Steps other than sending and receiving can be executed by the processing unit (processor). For the functions of specific units, refer to the corresponding method embodiments. Among them, there may be one or more processors.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在两个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "component", "module", "system", etc. used in this specification are used to denote computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, the component may be, but is not limited to, a process, a processor, an object, an executable file, an execution thread, a program, and/or a computer running on a processor. Through the illustration, both the application running on the computing device and the computing device can be components. One or more components may reside in processes and/or threads of execution, and components may be located on one computer and/or distributed between two or more computers. In addition, these components can be executed from various computer readable media having various data structures stored thereon. The component can be based on, for example, a signal having one or more data packets (e.g. data from two components interacting with another component in a local system, a distributed system, and/or a network, such as the Internet that interacts with other systems through a signal) Communicate through local and/or remote processes.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各种说明性逻辑块(illustrative logical block)和步骤(step),能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art may realize that the various illustrative logical blocks and steps described in the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. accomplish. Whether these functions are executed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者服务器等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the function is realized in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a server, etc.) execute all or part of the steps of the methods in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .
以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific implementations of this application, but the scope of protection of this application is not limited to this. Any person skilled in the art can easily conceive of changes or substitutions within the technical scope disclosed in this application, which shall cover Within the scope of protection of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (14)

  1. 一种活体检测方法,应用于电子设备,其特征在于,包括:A living body detection method applied to electronic equipment, characterized in that it comprises:
    获取第一图像,所述第一图像为RGB图像,且,所述第一图像包括目标对象的人脸图像;Acquiring a first image, where the first image is an RGB image, and the first image includes a face image of a target object;
    根据所述第一图像和深度图像生成网络获得第一深度图像;Obtaining a first depth image according to the first image and the depth image generation network;
    根据所述第一图像、所述第一深度图像和检测网络确定活体检测结果;所述活体检测结果用于指示所述目标对象是否为活体;Determine a living body detection result according to the first image, the first depth image, and a detection network; the living body detection result is used to indicate whether the target object is a living body;
    根据所述活体检测结果执行动作。Perform an action based on the result of the living body detection.
  2. 如权利要求1所述的方法,其特征在于,所述深度图像生成网络包括第一神经网络和第二神经网络;The method of claim 1, wherein the depth image generation network includes a first neural network and a second neural network;
    所述根据所述第一图像和深度图像生成网络获得第一深度图像具体包括:The obtaining the first depth image according to the first image and the depth image generation network specifically includes:
    通过所述第一神经网络提取所述第一图像的粗颗粒度特征;Extracting coarse-grained features of the first image through the first neural network;
    通过所述第二神经网络提取所述第一图像的细颗粒度特征;Extracting the fine-grained features of the first image through the second neural network;
    根据所述粗颗粒度特征和所述细颗粒度特征生成所述第一深度图像。The first depth image is generated according to the coarse granularity feature and the fine granularity feature.
  3. 如权利要求2所述的方法,其特征在于,所述根据所述粗颗粒度特征和所述细颗粒度特征生成所述第一深度图像包括:The method according to claim 2, wherein said generating said first depth image according to said coarse granularity feature and said fine granularity feature comprises:
    获取融合特征,所述融合特征是通过融合算法融合所述粗颗粒度特征和所述细颗粒度特征获得的;Acquiring a fusion feature, the fusion feature being obtained by fusing the coarse granularity feature and the fine granularity feature through a fusion algorithm;
    根据所述融合特征生成所述第一深度图像。The first depth image is generated according to the fusion feature.
  4. 如权利要求2所述的方法,其特征在于,所述第一神经网络和所述第二神经网络为轻量级卷积神经网络。The method of claim 2, wherein the first neural network and the second neural network are lightweight convolutional neural networks.
  5. 如权利要1所述的方法,其特征在于,所述检测网络包括第三神经网络和第四神经网络,The method according to claim 1, wherein the detection network includes a third neural network and a fourth neural network,
    所述根据第一图像、第一深度图像和检测网络确定活体检测结果具体包括:The determining the living body detection result according to the first image, the first depth image, and the detection network specifically includes:
    通过第三神经网络提取所述第一图像的特征;Extracting features of the first image through a third neural network;
    通过第四神经网络提取所述第一深度图像的特征;Extracting features of the first depth image through a fourth neural network;
    根据所述第一图像的特征和所述第一深度图像的特征获取特征图;Acquiring a feature map according to the feature of the first image and the feature of the first depth image;
    根据所述特征图确定活体检测结果。The living body detection result is determined according to the feature map.
  6. 如权利要求5所述的方法,其特征在于,所述根据所述特征图确定活体检测结果具体包括:The method according to claim 5, wherein the determining the result of the living body detection according to the characteristic map specifically comprises:
    将所述特征图进行全局池化获取全局特征;Global pooling the feature map to obtain global features;
    根据所述全局特征确定活体检测结果。The living body detection result is determined according to the global feature.
  7. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, wherein the method further comprises:
    获取第二图像;Acquire a second image;
    根据人脸检测算法获取所述第二图像中的所述人脸图像;Acquiring the face image in the second image according to a face detection algorithm;
    根据所述人脸图像确定所述第一图像。The first image is determined according to the face image.
  8. 如权利要求7所述的方法,其特征在于,所述第一图像为经过对齐和预处理的人脸图像。8. The method of claim 7, wherein the first image is a face image that has been aligned and preprocessed.
  9. 如权利要求1所述的方法,其特征在于,根据所述活体检测结果执行动作包括:The method of claim 1, wherein performing an action according to the living body detection result comprises:
    当所述活体检测结果指示所述目标对象为活体时,对所述目标对象进行人像跟踪。When the living body detection result indicates that the target object is a living body, performing portrait tracking on the target object.
  10. 如权利要求1所述的方法,其特征在于,根据所述活体检测结果执行动作包括:The method of claim 1, wherein performing an action according to the living body detection result comprises:
    当所述活体检测结果指示所述目标对象为活体时,确定所述目标对象是否为儿童;When the living body detection result indicates that the target object is a living body, determining whether the target object is a child;
    当所述目标对象为儿童,切换至儿童模式。When the target object is a child, switch to the child mode.
  11. 如权利要求1-10任一项权利要求所述的方法,其特征在于,所述深度图像生成网络和所述检测网络属于活体检测神经网络;The method according to any one of claims 1-10, wherein the depth image generation network and the detection network belong to a living body detection neural network;
    所述活体检测神经网络是根据局部特征和所述全局特征联合训练获得的。The living body detection neural network is obtained by joint training based on the local feature and the global feature.
  12. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    一个或多个处理器;One or more processors;
    存储器;Memory
    摄像头;Camera;
    以及一个或多个计算机程序,其中所述所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行如权利要求1-11任一项所述的活体检测方法。And one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs include instructions, and when the instructions are executed by the electronic device, The electronic device is caused to execute the living body detection method according to any one of claims 1-11.
  13. 一种计算机可读存储介质,其特征在于,包括计算机指令,当所述计算机指令在电子设备上运行时,使得所述电子设备执行如权利要求1-11中任一项所述的活体检测方法。A computer-readable storage medium, characterized by comprising computer instructions, when the computer instructions run on an electronic device, the electronic device is caused to execute the living body detection method according to any one of claims 1-11 .
  14. 一种芯片,其特征在于,所述芯片与电子设备中的存储器耦合,使得所述芯片在运行时调用所述存储器中存储的程序指令,使得所述电子设备执行如权利要求1-11任一所述的活体检测方法。A chip, characterized in that the chip is coupled with a memory in an electronic device, so that the chip calls the program instructions stored in the memory during operation, so that the electronic device executes any one of claims 1-11 The living body detection method.
PCT/CN2021/088272 2020-04-26 2021-04-20 Monocular camera-based liveness detection method, device, and readable storage medium WO2021218695A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010338191.8A CN113553887A (en) 2020-04-26 2020-04-26 Monocular camera-based in-vivo detection method and device and readable storage medium
CN202010338191.8 2020-04-26

Publications (1)

Publication Number Publication Date
WO2021218695A1 true WO2021218695A1 (en) 2021-11-04

Family

ID=78129851

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/088272 WO2021218695A1 (en) 2020-04-26 2021-04-20 Monocular camera-based liveness detection method, device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN113553887A (en)
WO (1) WO2021218695A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114202807A (en) * 2021-11-30 2022-03-18 北京百度网讯科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN115174138A (en) * 2022-05-25 2022-10-11 北京旷视科技有限公司 Camera attack detection method, system, device, storage medium and program product

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115421591B (en) * 2022-08-15 2024-03-15 珠海视熙科技有限公司 Gesture control device and image pickup apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635770A (en) * 2018-12-20 2019-04-16 上海瑾盛通信科技有限公司 Biopsy method, device, storage medium and electronic equipment
CN110674759A (en) * 2019-09-26 2020-01-10 深圳市捷顺科技实业股份有限公司 Monocular face in-vivo detection method, device and equipment based on depth map

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635770A (en) * 2018-12-20 2019-04-16 上海瑾盛通信科技有限公司 Biopsy method, device, storage medium and electronic equipment
CN110674759A (en) * 2019-09-26 2020-01-10 深圳市捷顺科技实业股份有限公司 Monocular face in-vivo detection method, device and equipment based on depth map

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114202807A (en) * 2021-11-30 2022-03-18 北京百度网讯科技有限公司 Living body detection method, living body detection device, electronic apparatus, and storage medium
CN115174138A (en) * 2022-05-25 2022-10-11 北京旷视科技有限公司 Camera attack detection method, system, device, storage medium and program product
CN115174138B (en) * 2022-05-25 2024-06-07 北京旷视科技有限公司 Camera attack detection method, system, device, storage medium and program product

Also Published As

Publication number Publication date
CN113553887A (en) 2021-10-26

Similar Documents

Publication Publication Date Title
US10956714B2 (en) Method and apparatus for detecting living body, electronic device, and storage medium
WO2021218695A1 (en) Monocular camera-based liveness detection method, device, and readable storage medium
WO2017181769A1 (en) Facial recognition method, apparatus and system, device, and storage medium
WO2021031609A1 (en) Living body detection method and device, electronic apparatus and storage medium
WO2021078001A1 (en) Image enhancement method and apparatus
WO2020243967A1 (en) Face recognition method and apparatus, and electronic device
CN112037162B (en) Facial acne detection method and equipment
CN105654039B (en) The method and apparatus of image procossing
EP3776296B1 (en) Apparatus and method for recognizing an object in electronic device
WO2021008551A1 (en) Fingerprint anti-counterfeiting method, and electronic device
WO2021036853A1 (en) Image processing method and electronic apparatus
WO2022179376A1 (en) Gesture control method and apparatus, and electronic device and storage medium
Vazquez-Fernandez et al. Built-in face recognition for smart photo sharing in mobile devices
CN109937434B (en) Image processing method, device, terminal and storage medium
CN114140365B (en) Event frame-based feature point matching method and electronic equipment
CN112052830B (en) Method, device and computer storage medium for face detection
CN112036331A (en) Training method, device and equipment of living body detection model and storage medium
WO2020103732A1 (en) Wrinkle detection method and terminal device
US20220130019A1 (en) Electronic device and method for processing image by same
CN107977636B (en) Face detection method and device, terminal and storage medium
CN115150542B (en) Video anti-shake method and related equipment
CN110348272B (en) Dynamic face recognition method, device, system and medium
CN114827442A (en) Method and electronic device for generating image
CN115049819A (en) Watching region identification method and device
CN113591526A (en) Face living body detection method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21797442

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21797442

Country of ref document: EP

Kind code of ref document: A1