CN112883758B

CN112883758B - Living body detection method and device

Info

Publication number: CN112883758B
Application number: CN201911197600.0A
Authority: CN
Inventors: 任志浩; 邹保珠; 华丛一; 王升国
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2023-08-25
Anticipated expiration: 2039-11-29
Also published as: CN112883758A

Abstract

The application discloses a living body detection method, which comprises the steps of extracting image data in an effective area based on an infrared image of a target to be detected, inputting the extracted image data in the effective area to a trained neural network classifier for recognition, obtaining a detection result from the output of the trained neural network classifier, classifying the trained neural network classifier according to whether vein distribution information of biological characteristics is contained in the image or not, recognizing the infrared image of the target to be detected containing the vein distribution information of the biological characteristics as a living body, and recognizing the infrared image of the target to be detected which does not contain the vein distribution information of the biological characteristics as a non-living body. The detection method provided by the application has the advantages of stable performance, strong anti-attack capability and very high safety.

Description

Living body detection method and device

Technical Field

The application relates to the field of image recognition detection, in particular to a living body detection method.

Background

The living body identification and detection is mainly performed by identifying biological characteristic information on a living body, and the biological characteristic information is used as a vital sign to distinguish the biological characteristic forged by non-living materials such as photos, silica gel, plastics and the like. In colloquial terms, it is determined during the identification and detection that the object being detected is indeed a "living body" with vital signs, not a prosthesis with inanimate signs such as photographs, videos or something else.

Taking recognition and detection of a human face living body as an example. Currently, face biopsy techniques mainly include interactive motion, 3D imaging (multi-view imaging, structured light, TOF, etc.), video streaming, etc., wherein,

the interactive action requires the user to complete corresponding actions such as blinking, smiling, reading and the like in cooperation with the instruction, the photo and the living face are distinguished according to the change condition of the action state obtained by judgment, the user is required to cooperate, the user experience is poor, and once all the interactive instructions are acquired, the video can be recorded pertinently, so that the video attack is difficult to prepare;

the 3D imaging recognition detection technology is used for recognition detection based on deep depth images, is less influenced by object materials and illumination, and can well distinguish true and false faces. However, 3D imaging recognition detection has a very high false detection rate for some 3D printed masks;

the video stream identification and detection technology is based on real-time video stream identification and detection, and the video stream identification is easy to generate false detection on the played video.

Disclosure of Invention

The application provides a living body detection method for reducing false detection.

In one aspect, the present application provides a method of in vivo detection, the method comprising,

based on the infrared image of the object to be detected, image data in the effective area is extracted,

the extracted effective area image data is input into a trained neural network classifier for recognition,

obtaining a detection result from the output of the trained neural network classifier,

the trained neural network classifier classifies whether the images contain vein distribution information of biological characteristics or not, the infrared images of targets to be detected containing the vein distribution information of the biological characteristics are identified as living bodies, and the infrared images of targets to be detected which do not contain the vein distribution information of the biological characteristics are identified as non-living bodies.

Preferably, the method further comprises preprocessing the extracted effective area image data by one or any combination of the following: equalizing, normalizing and filtering;

the infrared image of the object to be detected comprises a face image including facial features;

the neural network classifier is trained alternately by a set of positive samples, which are infrared images containing venous distribution information of biological features, and a set of negative samples, which are any images not containing venous distribution information of biological features.

Preferably, the extracting the image data in the effective area includes,

determining the width of an effective rectangular area according to the distance between pupils, determining the height of the effective rectangular area according to the distance from the hairline of the forehead to the chin,

acquiring left eye pupil coordinates, right eye pupil coordinates, left mouth corner coordinates and right mouth corner coordinates in the face image, calculating the average value of the 4 coordinates to obtain the center position of the face,

determining the position of the effective rectangular area according to the fact that the distance between the center of the effective rectangular area and the center position of the face is smaller than a set first distance threshold value,

image data within the effective rectangular area is extracted.

Preferably, the extracting the image data in the effective area includes,

determining the width of an effective rectangular area according to the distance between pupils, determining the height of the effective rectangular area according to the width to obtain the effective rectangular area at least limited except the area above eyes,

determining the position of the effective rectangular area according to the center position of the face and the range defined by the effective rectangular area, with the aim of increasing the number of image pixels defined by the effective rectangular area,

image data within the effective rectangular area is extracted.

Wherein the determining the width of an effective rectangular area according to the inter-pupil distance comprises taking the product of the inter-pupil distance and the first coefficient as the width of the effective rectangular area,

the determining the height of the effective rectangular area according to the width comprises taking the product of the second coefficient and the width of the effective rectangular area as the height of the effective rectangular area,

wherein the first coefficient is greater than 1 and the second coefficient is determined from the height below the eye to a portion or the entire chin;

the determining the position of the effective rectangular area according to the center position of the face and the range defined by the effective rectangular area, with the aim of increasing the number of image pixels defined by the effective rectangular area, includes,

the first position is determined according to the fact that the height from the ordinate of the center of the face to the chin occupies the effective rectangular area and is larger than a first threshold value, the second position is determined according to the fact that the center of the face is on the central line of the width direction of the effective rectangular area or is deviated from the central line and smaller than a second threshold value, the image pixels defined by the effective rectangular area comprise cheek areas from the lower part of eyes to part of or the whole chin, and the position of the effective rectangular area is determined.

Preferably, the extracting the image data in the effective area includes,

extracting facial image contour and eye lower eyelid image contour,

a first curve segment intersecting the left face image contour at a first intersection point and intersecting the right face image contour at a second intersection point is formed below the lower eyelid image contour,

forming a closed curve by a first facial image contour including a mandible and the first curve section between the first intersection point and the second intersection point, wherein a closed area formed by the closed curve is taken as an effective area;

image data within the effective area is extracted.

Preferably, the extracting the image data in the effective area includes,

removing areas of eyes and a mouth in the face image to obtain a residual face image, taking the residual face image as an effective area, and extracting image data in the effective area;

wherein the region of eyes and mouth in the facial image is removed, including,

removing a first transverse strip-shaped area penetrating through the eyes or two rectangular areas penetrating through the pupil parts respectively and a second transverse strip-shaped area penetrating through the mouth parts;

the width of the first transverse strip-shaped area is the distance between the left outer eye corner and the right outer eye corner, and the height of the first transverse strip-shaped area is the average value of the longitudinal distances between the upper eyelid and the lower eyelid in two eyes; the distance between the center of the first transverse strip-shaped area and the centers of two eyes is smaller than a set fourth distance threshold value, and the center positions of the eyes of the two eyes are the average value of the coordinates of two pupils;

the width of the second transverse strip-shaped area is the distance between the left mouth corner and the right mouth corner, and the height of the second transverse strip-shaped area is the distance between the average value of the longitudinal coordinates of the two lip peaks and the longitudinal coordinate of the edge valley of the lower edge of the mouth; the distance between the center of the second transverse strip-shaped area and the center of the mouth is smaller than a set fifth distance threshold, and the position of the center of the mouth is the average value of the left mouth angular coordinate and the right mouth angular coordinate; the edge valley is positioned at the lowest position of the lower edge of the mouth;

the width of the rectangular area is the product of the interpupillary distance and a fourth coefficient, the height is the product of the width and a fifth coefficient, and the distance between the center of the rectangular area and the pupil is smaller than a set third distance threshold value, wherein the fourth coefficient and the fifth coefficient are smaller than 1.

Preferably, the extracting the image data in the effective area further includes,

establishing a mouth mask area for setting a pixel value of the mouth area to be 0, wherein the mouth mask area is a rectangular area, the width of the rectangular area is the distance of the horizontal coordinate between the left mouth angle and the right mouth angle, and the height of the rectangular area is the product of the rectangular width and a third coefficient; the distance between the center of the rectangular area and the center of the mouth is smaller than a set second distance threshold value, wherein a third coefficient is smaller than 1;

the pixel value of each pixel point in the mouth mask region is set to 0.

Preferably, the extracting the image data in the effective area further includes creating an eye mask area for setting an eye area pixel value of 0, the eye mask area including two rectangular areas of the same area size, the rectangular area having a width that is a product of a pupil distance and a fourth coefficient, a height that is a product of the width and a fifth coefficient, a center of the rectangular area being less than a set third distance threshold, wherein the fourth coefficient and the fifth coefficient are both less than 1,

the pixel value of each pixel point in the eye mask area is set to 0.

In another aspect, the application provides a method of training a neural network classifier, the method comprising,

extracting image data in an effective area based on sample image data, wherein the samples comprise positive samples and negative samples, the positive samples are infrared images containing vein distribution information of biological characteristics, and the negative samples are any images not containing the vein distribution information of the biological characteristics;

the effective area image data extracted based on the samples is alternately input to the neural network classifier for training according to a set of positive samples and a set of negative samples,

and saving the network parameters of the trained neural network classifier.

In yet another aspect, the application provides a biopsy device comprising a memory and a processor, wherein,

the memory is stored with an application program,

the processor executes the application program to implement any of the living body detection steps described above.

A further aspect of the present application provides a computer-readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of any of the above-described living body detection methods.

According to the method, whether the infrared image of the object to be detected is a living body or not can be detected through the trained neural network classifier according to whether vein distribution information is contained in the image, the method can effectively resist attacks of photos, videos, models and the like, the safety of detection equipment is improved, and due to the characteristics that the vein distribution image is stable and difficult to forge, the detection robustness is improved, and the method is simple in implementation, less in time consumption and high in recognition accuracy by combining the deep learning function of the neural network classifier.

Drawings

Fig. 1 is a schematic illustration of venous features of a human face.

Fig. 2 is a schematic view of an effective rectangular area.

Fig. 3a and 3b are schematic diagrams of mask areas.

Fig. 4a to 4d show schematic diagrams of the relationship between the effective rectangular area and the face position.

Fig. 5 is a schematic view of an irregular effective region.

Fig. 6a and 6b are schematic diagrams of the effective area remaining after the area is removed based on the face image.

Fig. 7 is a schematic diagram of training a neural network classifier.

Fig. 8 is a schematic flow chart of living body detection according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical means and advantages of the present application more apparent.

The applicant found that the current living body identification technology and living body detection technology are based on external surface characteristics of biological characteristics, and are influenced by factors such as ambient light, change of the external surface characteristics, age and the like, and the identification rate and the long-term stability have problems. In addition, external surface features are very easy to collect, e.g. facial information of a person, which in turn increases the risk of being imitated. In order to circumvent the risk of counterfeiting that exists when the external surface features of the biological feature are detected in vivo, it is desirable to have in vivo identification detection based on the intrinsic features of the biological feature.

Vein-based identification detection has the natural advantage that features are not readily available and are not easily counterfeited. The vein detection and identification is based on the internal characteristics of the human body, namely the infrared feedback of subcutaneous vascular tissues of the human body surface, and the stability and the privacy are stronger. Referring to fig. 1, fig. 1 is a schematic representation of the venous features of a person's face, from the inner canthus veins at the inner canthus, along with the descending facial arteries, to the veins where the hyoid plane merges into the neck; the absorption of near infrared light by deoxyhemoglobin in veins and surrounding tissues is different, so that clear imaging can be realized in the near infrared band. In the figure, the venous vessels are gray and the arterial vessels are black.

In view of the fact that whether the object to be detected is a living body or not can be identified by distinguishing whether the object to be detected comprises vein vascular tissues or not during living body identification detection, a complex comparison process is not needed, and the image of the object to be detected is identified by combining machine learning, so that a detection result is obtained.

In the following, the image recognition will be described by taking the object to be detected as including facial features as an example, it should be understood that the present application is not limited to facial features, but may also refer to other features having venous vascular tissue such as a finger, a palm, etc.

Referring to fig. 8, fig. 8 is a schematic flow chart of living body detection according to an embodiment of the present application. The method of detection includes the steps of,

step 801, extracting image data in an effective area based on an image to be detected.

In order to save the calculation amount and improve the final judgment accuracy, the image data of the effective area is extracted.

In one embodiment, referring to fig. 2, fig. 2 is a schematic view of an effective rectangular area. Acquiring left eye pupil coordinates, right eye pupil coordinates, mouth left mouth corner coordinates and right mouth corner coordinates in the face image, calculating the average value of the 4 coordinates to obtain the face center position,

expressed by the mathematical formula:

fc_x＝(eyel_x+eyer_x+mouthl_x+mouthr_x)/4

fc_y＝(eyel_y+eyer_y+mouthl_y+mouthr_y)/4

wherein the coordinates of the center position of the face are (fc_x, fc_y), the left eye pupil coordinates are (eyel_x, eyel_y), the right eye pupil coordinates (eyer_x, eyer_y), the left mouth corner coordinates (mouthl_x, mouthl_y), and the right mouth corner coordinates (mouthr_x, mouthr_y).

An effective rectangular area for extracting image data is formed with a width 2 times of the inter-pupil distance (inter-pupil distance) as the width of the effective rectangular area (broken line in fig. 2), with at least 70% of the width of the effective rectangular area as the height of the effective rectangular area, wherein the mathematical expression of the width of the effective rectangular area is expressed as:

facewidth＝w*(eyer_x-eyel_x)

faceheight＝h*facewidth，

wherein, facewidth is the width of the effective rectangular area, w is a first coefficient, faceheight is the height of the effective rectangular area, and h is a second coefficient; wherein the first coefficient is greater than 1, preferably a value of 2, and the second coefficient is determined according to the height of the forehead hairline to part or all of the chin.

The position (positioning) of the effective rectangular area on the face can be determined by combining the center position of the face and the height and width of the effective rectangular area according to the aim of increasing the number of image pixels defined by the effective rectangular area; for example, the distance between the center of the effective rectangular area and the center position of the face is smaller than a set first distance threshold, and preferably the center of the effective rectangular area coincides with the center position of the face.

Image data is extracted in accordance with an effective rectangular area including an eye and a mouth. Since the eyes and mouth regions in the face are free of vein tissue, image data of the regions free of vein tissue can be removed, that is, pixel values of the eyes and mouth regions are set to 0. The region formed by these plural continuous pixel points appears black in the image, and is referred to as a mask region in the present application.

Referring to fig. 3a and 3b, fig. 3a and 3b are schematic views of mask areas. As shown in fig. 3a, a mouth mask region for setting a mouth region pixel value of 0 is established, the region being rectangular, the width of the rectangle being the distance between the left and right mouth corners on the abscissa, the height of the rectangle being about two-thirds of the width of the rectangle; the distance between the center of the mouth mask region and the center of the mouth is smaller than a set second distance threshold, and preferably, the center of the mouth mask region coincides with the center position of the mouth. Wherein the central position of the mouth is the average value of the left mouth angular position and the right mouth angular position,

expressed by the mathematical formula:

mouthwidth＝mouthr_x-mouthl_x

mouthheight＝α×mouthwidth

mc_x＝(mouthl_x+mouthr_x)/2

mc_y＝(mouthl_y+mouthr_y)/2

wherein the coordinates of the center position of the mouth are (mc_x, mc_y), the left mouth corner coordinates are (mouthl_x, mouthl_y), the right mouth corner coordinates are (mouthr_x, mouthr_y), the width of the mouth mask region is mouthwidth, the height of the mouth mask region is mouthheight, and α is a third coefficient.

As shown in fig. 3b, an eye mask region for setting the pixel value of the eye region to 0 is respectively established for both eyes, the region is rectangular, the width of the rectangle is about two thirds of the interpupillary distance, and the height of the rectangle is about one third of the width of the rectangle; the distance between the center of the eye mask area and the pupil is smaller than a set third distance threshold, and preferably the center of the eye mask area coincides with the pupil position. Namely, the center of the left eye mask area coincides with the left eye pupil, the center of the right eye mask area coincides with the right eye pupil,

expressed by the mathematical formula:

eyewidth＝θ×(eyer_x-eyel_x)

eyeheight＝σ×eyewidth

wherein, the left eye pupil coordinate is (eyel_x, eyel_y), the right eye pupil coordinate is (eyer_x, eyer_y), the width of the eye mask area is eyewidth, the height of the eye mask area is eyeheight, θ is fourth coefficient, σ is fifth coefficient, fourth coefficient are respectively less than 1, preferably, fourth coefficient takes a value of 2/3, preferably, fifth coefficient takes a value of 1/3.

In the second embodiment, a rectangular area from the lower part of the eye area to part or all of the chin is taken as an effective area, specifically, the average value of 4 coordinates is calculated according to the left eye pupil coordinate, the right eye pupil coordinate, the mouth left mouth corner coordinate and the right mouth corner coordinate in the face image to obtain the face center position,

expressed by the mathematical formula:

fc_x＝(eyel_x+eyer_x+mouthl_x+mouthr_x)/4

fc_y＝(eyel_y+eyer_y+mouthl_y+mouthr_y)/4

An effective rectangular area for extracting image data is formed with a width 1.6 to 2 times of the inter-pupil distance (inter-pupil distance) as the width of the effective rectangular area, and at least 70% of the width of the effective rectangular area as the height of the effective rectangular area, wherein the mathematical expression of the width of the effective rectangular area is expressed as:

facewidth＝w*(eyer_x-eyel_x)

faceheight＝h*facewidth，

wherein, facewidth is the width of the effective rectangular area, w is a first coefficient, faceheight is the height of the effective rectangular area, and h is a second coefficient; wherein the first coefficient is greater than 1 and the second coefficient is determined from the height below the eye to a portion or the entire chin.

Since the eye region does not include the vein tissue image, the image within the effective rectangular region includes the cheek region other than the region above the eye region, the first position is determined such that the height of the face center ordinate to the chin occupies the effective rectangular region above the first threshold, the second position is determined such that the face center is on the center line in the width direction of the effective rectangular region or is deviated from the center line by less than the second threshold, for example, the first position is determined such that the height of the face center ordinate to the chin occupies at least 50% of the height of the effective rectangular region, the second position is determined such that the face center is on the center line in the width direction of the effective rectangular region or is deviated from the vicinity of the center line by 20% or less, and the image data within the effective rectangular region is extracted such that the effective rectangular region includes the cheek region below the eye to a part or the entire chin.

Referring to fig. 4a to 4d, fig. 4a to 4d show schematic diagrams of the relationship between the effective rectangular area and the face position, i.e., the positioning of the effective rectangular area. Wherein fig. 4a is a case where the height of the effective rectangular area occupied by the vertical coordinate of the center of the face to the chin is less than 50%, and fig. 4b is another case where the height of the effective rectangular area occupied by the vertical coordinate of the center of the face to the chin is less than 50%, in both cases, since the image data of the face area in the height direction within the effective rectangular area is limited, the extracted image data cannot be used as effective data; fig. 4c is a view showing a case where the center of the face is deviated from the center line in the width direction of the effective rectangular area to be larger, in which case the extracted image data cannot be used as effective data because the image data of the face area in the width direction in the effective rectangular area is limited; fig. 4d shows a situation where the effective rectangular area is ideal with respect to the face, the effective rectangular area including the cheek area under the eyes to part of the chin.

In the third embodiment, the shape of the effective area may be a closed irregular polygon formed by connecting a plurality of curve segments end to distinguish image data to be extracted from the face image. For example, referring to fig. 5, fig. 5 is a schematic view of an irregular effective region. Extracting a face image contour and an eye lower eyelid image contour, forming a first curve section intersecting with the left face image contour at a first intersection point and intersecting with the right face image at a second intersection point below the lower eyelid image contour, wherein a closed curve formed by the first face image contour of the lower jaw and the first curve section is included between the first intersection point and the second intersection point, and the area formed by the closed curve is used as an effective area.

In the second and third embodiments described above, the mouth region is included in the extracted effective region image data. Since the mouth region is free of venous vascular tissue, the image data of the region free of venous vascular tissue can be removed, i.e. a mouth mask region is provided as described in fig. 3 a.

In a fourth embodiment, an area including eyes and a mouth in a face image is removed to extract an effective area image of the face,

referring to fig. 6a and 6b, fig. 6a and 6b are schematic views of the effective area remaining after the area is removed based on the face image. As shown in fig. 6a, the removal region comprises a first lateral strip-shaped region penetrating through the eyes, and a second lateral strip-shaped region penetrating through the mouth, wherein the width of the first lateral strip-shaped region is the distance between the left outer corner and the right outer corner, and the height of the first lateral strip-shaped region is the average value of the longitudinal distances between the upper eyelid and the lower eyelid in the two eyes; the distance between the center of the first transverse strip-shaped area and the centers of two eyes is smaller than a set fourth distance threshold, and preferably, the center of the first transverse strip-shaped area coincides with the center position of the eyes, wherein the center position of the eyes of two eyes is the average value of the coordinates of two pupils.

The width of the second transverse strip-shaped area is the distance between the left mouth angle and the right mouth angle, and the height of the second transverse strip-shaped area is the distance between the average value of the longitudinal coordinates of the two lip peaks and the longitudinal coordinate of the edge valley of the lower edge of the mouth; the distance between the center of the second transverse strip-shaped area and the center of the mouth is smaller than a set fifth distance threshold, and preferably, the center of the second transverse strip-shaped area coincides with the center of the mouth, wherein the position of the center of the mouth is the average value of the left mouth angular coordinate and the right mouth angular coordinate.

Expressed by the mathematical formula:

for the first lateral strip-shaped region,

Hx＝|eyelo_x-eyero_x|

Hc_x＝(eyel_x+eyer_x)/2

Hc_y＝(eyel_y+eyer_y)/2

where Hx is the width of the first lateral stripe, eyelo_x is the left outer corner of the eye, eyelo_x is the right outer corner of the eye,

hy is the height of the first lateral stripe, eyelu_y is the ordinate of the upper left eyelid, eyelu_y is the ordinate of the lower left eyelid, eyelu_y is the ordinate of the upper right eyelid, eyelu_y is the ordinate of the lower right eyelid;

the center coordinates of the lateral stripe are (hc_x, hc_y), the left eye pupil coordinates are (eyel_x, eyel_y), and the right eye pupil coordinates are (eyer_x, eyer_y).

For the second lateral strip-shaped region,

Vx＝|mouthl_x-mouthr_x)|

Vc_x＝(mouthl_x+mouthr_x)/2

Vc_y＝(mouthr_y+mouthl_y)/2

where Vx is the width of the second lateral stripe, mouthl_x is the left mouth corner abscissa, mouthr_x is the right mouth corner abscissa,

vy is the height of the second lateral stripe, mouthlu_y is the ordinate of the right lip peak, mouthlu_y is the ordinate of the left lip peak, and mouthd_y is the ordinate of the mouth lower edge valley. Wherein, the edge valley is located at the lowest position of the lower edge of the mouth part.

The center coordinates of the second lateral stripe are (Vc_x, vc_y).

And taking the remaining areas of the first transverse strip-shaped area and the second transverse strip-shaped area as effective areas, and extracting image data in the effective areas.

As shown in fig. 6b, the removal area includes rectangular areas penetrating through the pupil portion, respectively, the rectangular areas having the same size as the mask area in fig. 3b, and the rectangular areas are positioned at the same positions as the mask area in fig. 3 b.

Step 802, preprocessing the image data of the effective area to obtain better image quality and improve the detection accuracy.

The image preprocessing comprises the steps of carrying out image equalization in a histogram equalization mode; scaling the image to an N x N pixel size using bilinear interpolation, e.g., using 128 x 128 pixels, for normalization; filtering the image to reduce noise, including, but not limited to, gaussian filtering, mean-median filtering, etc.; and setting the pixel value to 0 by masking the region including the eye and mouth images. The equalization, normalization, noise reduction and mask area processing may be performed in no strict sequence, and preferably, the pixel value setting 0 in the mask area may be performed before the image equalization, so as to increase the image processing speed.

Step 803, inputting the preprocessed image data into a trained convolutional neural network classifier, so that the trained convolutional neural network classifier performs classification detection on the input image data, and obtaining a detection result from the output of the convolutional neural network classifier.

Because the trained convolutional neural network classifier can identify the image with the vein blood vessel tissue as a living body, when the image to be detected comprises the image of the vein blood vessel tissue, the image is judged to be the living body; otherwise, it is determined as a non-living body.

To obtain a machine learning model for identifying vein features, the machine learning model needs to be trained to obtain a trained machine learning model. Taking a neural network model as an example, the final effect of the neural network algorithm depends on the selection of a sample library, wherein the samples comprise positive samples and negative samples, the positive samples are images acquired by the real human face, and the negative samples comprise, but are not limited to, non-living face images such as photos, silica gel masks, 3D models, images stored in a terminal and the like.

In this embodiment, the positive sample is a face image in a near infrared band (NIR band), the band range is 780nm-1100nm, the image in the band is less affected by external light, the characteristics are stable, the information such as the pupil of the face is obvious, and the image can be directly stored in a 256-order 8-bit gray scale pattern by an image sensor. Preferably, facial images in the main light bands of 850nm, 940nm, etc. may be employed. The resolution of the positive sample image can be the common VGA, 480P, 720P and 1080P resolution images, including images with the image resolution changed by clipping rather than interpolation algorithm, so as to improve the deviation of the recognition accuracy of the neural network model caused by the difference of the facial image resolution; face images under different environments are collected, and the face images comprise different illumination conditions such as indoor illumination, outdoor illumination, strong background illumination and the like, so that misjudgment caused by different scenes is improved.

Extraction and image preprocessing of image data of the effective area can be performed, whether positive or negative. The processing of the sample will be described below.

The effective area of the sample image is determined according to the embodiment of determining the effective area of the image to be detected in step 801, for example, if the effective area is determined according to the first embodiment adopted at the time of detection, the effective area is determined by the sample according to the first embodiment also adopted at the time of training. Then, image data is extracted according to the determined effective area.

After extracting the image data in the effective area according to the above embodiment, the image data in the extracted effective area is equalized to increase the contrast of the image, and in particular, a histogram equalization manner may be adopted. The normalization process is then performed, scaling the image to an N x N pixel size using bilinear interpolation. Further, the filtering process may be performed after the equalization according to the image quality. The order in which the samples are pre-processed is the same as the pre-processing order, processing parameters, etc. in step 802.

Through the above-described processing, samples including positive samples and negative samples are thus obtained. Referring to fig. 7, fig. 7 is a schematic diagram of a training neural network classifier, wherein the switches represent alternations between inputting a set of positive samples and inputting a set of negative samples; and alternately inputting positive samples and negative samples into the neural network classifier to be trained to train, generating network parameters, and storing the trained network parameters of the neural network classifier. In this embodiment, the neural network classifier is a Convolutional Neural Network (CNN) classifier that includes a number of convolutional layers, a pooling layer, and a fully-connected layer. Preferably, 3 convolution layers, 2 max-pulling layers, 1 full link layer and 1 soft-max layer can be included; wherein, the first 2 convolution layers are respectively connected with a max-pulling layer, the input layer is a 128×128 column vector, and soft-max is used for outputting an activity value.

The embodiment of the application carries out living body detection based on the facial vein distribution information through the trained neural network classifier, and the convolutional neural network classifier is convenient to train and high in recognition accuracy by means of the large difference between the image comprising the facial vein distribution information and the image without the facial vein distribution information, so that higher accuracy can be obtained when the trained convolutional neural network classifier is applied to recognize and detect the image to be detected; in addition, vein distribution information is very stable and difficult to forge, so that the stability and the anti-attack capability of living body detection can be effectively ensured, and the safety is improved; the detection method does not need the cooperation of the user according to the specific instruction, has good user experience and high detection efficiency.

The application provides a living body detection device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor is configured to realize the steps of the detection method.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The embodiment of the application also provides a computer readable storage medium, wherein the storage medium stores a computer program, and the computer program realizes the following steps when being executed by a processor:

For the apparatus/network side device/storage medium embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and the relevant points are referred to in the description of the method embodiment.

It should be noted that, in the embodiment of the neural network classifier training method provided by the application, the neural network classifier is not limited to the CNN model, and other data models requiring training can be adopted.

In engineering application, the specific shapes and positions of the effective area and the mask area can be designed and adjusted by combining the characteristics of the biological characteristics of the image to be detected.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. A living body detecting method, characterized in that the method comprises,

wherein, the liquid crystal display device comprises a liquid crystal display device,

the infrared image of the object to be detected comprises a facial image of the facial feature,

the location of the active area is determined as follows:

determining a first position according to the fact that the height from the ordinate of the center of the face to the chin is greater than a first threshold value, determining a second position according to the fact that the height from the ordinate of the center of the face to the chin is greater than a second threshold value, and determining the position of the effective rectangular region according to the fact that the center of the face is on the central line of the width direction of the effective rectangular region or is deviated from the central line to be smaller than a second threshold value, wherein image pixels defined by the effective rectangular region comprise cheek regions from the lower part of eyes to part of or the whole chin;

the trained neural network classifier classifies whether the images contain vein distribution information of biological characteristics or not, the infrared images of the targets to be detected containing the vein distribution information of the biological characteristics are identified as living bodies, and the infrared images of the targets to be detected, which do not contain the vein distribution information of the biological characteristics, are identified as non-living bodies.

2. The detection method according to claim 1, further comprising preprocessing the extracted effective area image data by one or any combination of the following: equalizing, normalizing and filtering;

3. The method of claim 1, wherein determining the width of an effective rectangular area based on the inter-pupil distance comprises taking the product of the inter-pupil distance and the first coefficient as the width of the effective rectangular area,

wherein the first coefficient is greater than 1 and the second coefficient is determined from the height below the eye to a portion or the entire chin.

4. The detection method according to claim 1, wherein the extracting the image data in the effective area further comprises,

the pixel value of each pixel point in the mouth mask region is set to 0.

5. The detecting method of claim 1, wherein the extracting the image data in the effective area further includes creating an eye mask area for setting an eye area pixel value of 0, the eye mask area including two rectangular areas of the same area size, a width of the rectangular area being a product of an inter-pupillary distance and a fourth coefficient, a height being a product of the width and a fifth coefficient, a distance between a center of the rectangular area and a pupil being smaller than a set third distance threshold, wherein the fourth coefficient and the fifth coefficient are each smaller than 1,

the pixel value of each pixel point in the eye mask area is set to 0.

6. A training method of a neural network classifier is characterized in that the method comprises the following steps,

saving the network parameters of the trained neural network classifier;

the sample image includes a facial image of the facial feature,

the location of the active area is determined as follows:

7. A living body detecting device is characterized by comprising a memory and a processor, wherein,

the memory is stored with an application program,

a processor executing the application implements the living body detection steps as claimed in any one of claims 1 to 5.

8. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the living body detection method according to any one of claims 1 to 5.