CN115066712A

CN115066712A - Identity recognition method, terminal, server and system

Info

Publication number: CN115066712A
Application number: CN202080003701.4A
Authority: CN
Inventors: 许景涛
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-09-16
Also published as: WO2022140879A1

Abstract

The application provides an identity recognition method, a terminal, a server and a system, and relates to the technical field of computers. According to the method, at least one frame of first image to be recognized is extracted from a video stream, a target image is selected from the video stream, a first face area corresponding to the target image is determined as a target face area, then the target image is cut according to the target face area, the cut image is sent to a server for face recognition, and identity information of a target user is determined. When the acquired video stream contains the face of the target user, the identity information of the target user can be known in time, and the staff can customize the marketing scheme aiming at the identity information of the user in time, so that the marketing efficiency is improved, and the experience effect of the user is improved.

Description

Identity recognition method, terminal, server and system

Technical Field

The present application relates to the field of computer technologies, and in particular, to an identity recognition method, a terminal, a server, and a system.

Background

In daily life and work, people often need to go to places such as bank business halls to handle related businesses.

At present, before business handling, a user needs to extract a corresponding number from an automatic number taking machine and queue the number to a corresponding counter to handle business according to the number sequence, the user can provide a relevant identity file when handling relevant business at the counter, and at the moment, a bank worker can know the identity information of the user. For example, for a bank VIP user, the identity information of the VIP user may be determined only when the VIP user goes to a counter for business.

However, currently, in a mode of queuing by drawing a number and then handling a service at a counter, a worker cannot know identity information of a user in time, and therefore a marketing scheme for the user cannot be customized in time according to the identity information of the user, and user experience is poor.

SUMMARY

Some embodiments of the present application provide the following technical solutions:

in a first aspect, an identity recognition method is provided, which is applied to a first terminal, and the method includes:

extracting at least one frame of first image to be identified from the video stream;

detecting a first face area where a face in each frame of the first image to be recognized is located based on a target detection model;

selecting a target image from the first to-be-recognized image of each frame, and determining a first face area corresponding to the target image as a target face area;

according to the target face area, cutting the target image to obtain a cut image;

and sending the cut image to a server so as to identify the cut image through the server to determine the identity information of the target user.

In a second aspect, an identity recognition method is provided, which is applied to a server, and the method includes:

receiving a cutting image sent by a first terminal; the cutting image is obtained by cutting a target image according to a target face area by the first terminal;

extracting a first face characteristic point in the cut image;

comparing the first face characteristic point with the face characteristic points stored in the server, and determining a user identifier corresponding to a target user;

and acquiring the identity information of the target user corresponding to the user identification.

In a third aspect, a terminal is provided, where the terminal is a first terminal, and the first terminal includes:

the first to-be-recognized image extraction module is configured to extract at least one frame of first to-be-recognized image from the video stream;

the first face region detection module is configured to detect a first face region where a face in each frame of the first image to be recognized is located based on a target detection model;

the target face area determining module is configured to select a target image from the first to-be-recognized images of each frame and determine a first face area corresponding to the target image as a target face area;

the target image cutting module is configured to cut the target image according to the target face area to obtain a cut image;

the cutting image sending module is configured to send the cutting image to a server so as to identify the cutting image through the server to determine the identity information of the target user.

In a fourth aspect, a server is provided, comprising:

the cutting image receiving module is configured to receive a cutting image sent by a first terminal; the cutting image is obtained by cutting a target image according to a target face area by the first terminal;

a first face feature point extraction module configured to extract a first face feature point in the cropped image;

the user identification determining module is configured to compare the first face characteristic points with the face characteristic points stored in the server and determine a user identification corresponding to a target user;

and the identity information acquisition module is configured to acquire the identity information of the target user corresponding to the user identifier.

In a fifth aspect, a terminal is provided, which comprises a processor, a memory and a computer program stored on the memory and operable on the processor, wherein the computer program, when executed by the processor, implements the steps of the identification method described above.

In a sixth aspect, a computer-readable medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned identification method.

In a seventh aspect, a server is provided, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when executed by the processor, the computer program implements the steps of the above-mentioned identification method.

In an eighth aspect, a computer-readable medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned identification method.

A ninth aspect provides an identity recognition system, which includes a camera, a second terminal, a third terminal, the first terminal and the server;

the camera is configured to acquire a video stream and send the video stream to the first terminal;

the second terminal is configured to receive identity information of a target user sent by the server so as to remind the target user of visiting;

the third terminal is configured to send the face image and the identity information of the registered user to the server, and receive registration error information returned by the server when a second face characteristic point contained in the face image is stored in the server.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 schematically illustrates a flow chart of an identification method according to an embodiment of the present application;

fig. 2 schematically illustrates a detailed flowchart of an identity recognition method according to an embodiment of the present application;

FIG. 3 schematically illustrates a flow chart of another method of identification according to an embodiment of the present application;

fig. 4 schematically shows a flowchart of a registration process of registering face feature points and identity information of a user in an embodiment of the present application;

fig. 5 schematically shows a block diagram of a terminal according to an embodiment of the present application;

fig. 6 is a block diagram schematically illustrating a structure of a server according to an embodiment of the present application;

FIG. 7 is a block diagram that schematically illustrates an identification system in accordance with an embodiment of the present application;

FIG. 8 schematically shows a block diagram of a computing processing device for performing a method according to the present application; and

fig. 9 schematically shows a storage unit for holding or carrying program code implementing a method according to the present application.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, a flowchart of an identity recognition method according to an embodiment of the present application is shown, and is applied to a first terminal, where the method specifically includes the following steps:

step 101, at least one frame of first image to be identified is extracted from a video stream.

In the embodiment of the application, the camera collects the video stream in real time and sends the collected video stream to the first terminal, and then the first terminal reads the video stream collected by the camera in real time according to a multithreading mode and extracts the video frame from the video stream according to a preset extraction mode.

Specifically, each frame of image in the video stream may be extracted to obtain a plurality of video frames, or one frame of image may be extracted from the video stream every preset number of frames to obtain a plurality of video frames, for example, one frame of image may be extracted from the video stream every 10 frames or one frame of image may be extracted from the video stream every 5 frames.

Then, at least one first to-be-identified image is selected from the obtained multiple video frames, specifically, only one video frame can be selected from the multiple video frames as the first to-be-identified image, and at this time, the frame number of the first to-be-identified image extracted from the video stream is one frame; at least two video frames can be selected from the plurality of video frames as the first to-be-identified image, and at this time, the number of frames of the first to-be-identified image extracted from the video stream is at least two frames.

For example, the duration of the video stream is 1s, the number of frames of images included in the video stream per second is 60 frames, and one frame of image is extracted from the video stream every 10 frames, 6 video frames can be extracted from the video stream, the first video frame is the 1 st frame of image in the video stream, the second video frame is the 11 th frame of image in the video stream, the third video frame is the 21 st frame of image in the video stream, the fourth video frame is the 31 st frame of image in the video stream, the fifth video frame is the 41 th frame of image in the video stream, and the sixth video frame is the 51 th frame of image in the video stream. Moreover, a first video frame can be selected as a first image to be identified, and at the moment, the frame number of the first image to be identified is one frame; or, the first video frame, the second video frame and the third video frame may be selected as the first image to be recognized, and at this time, the frame number of the first image to be recognized is 3 frames.

And 102, detecting a first face area where the face in each frame of the first image to be recognized is located based on a target detection model.

In the embodiment of the application, a pre-trained target detection model is stored in the first terminal, and the target detection model is a neural network model. The target detection model is obtained by training a plurality of sample images and human face regions artificially labeled in each sample image.

After the first terminal extracts at least one frame of first image to be recognized from the video stream, the first image to be recognized of each frame is input into the target detection model, and a first face area where a face of each frame of the first image to be recognized is located is obtained.

In practice, the first face area is a rectangular frame area, and the detected first face area is represented by the position information of the first face area, such as representing the first face area by the coordinates of the upper left corner of the rectangular frame, the width of the rectangular frame, and the height of the rectangular frame.

Step 103, selecting a target image from the first to-be-recognized images of each frame, and determining a first face area corresponding to the target image as a target face area.

In the embodiment of the application, after detecting the first face area where the face in each frame of the first image to be recognized is located, the first terminal selects the target image from each frame of the first image to be recognized, and determines the first face area corresponding to the target image as the target face area.

When the frame number of the first image to be recognized is one frame, the first image to be recognized is the target image, and the first face area of the first image to be recognized is the target face area of the target image; and when the frame number of the first to-be-recognized image is at least two frames, according to the relationship between the first face regions, selecting one of the first to-be-recognized images as a target image from the at least two first to-be-recognized images, wherein the first face region corresponding to the target image is the target face region of the target image.

For example, the frame number of the first to-be-recognized image is 3 frames, which are the first video frame, the second video frame and the third video frame extracted from the video stream, respectively, the third video frame may be selected as the target image, and the first face region of the third video frame is also the target face region of the target image.

And 104, cutting the target image according to the target face area to obtain a cut image.

In the embodiment of the application, after determining a target image and a target face area corresponding to the target image, a first terminal cuts the target image according to the target face area to obtain a cut image, wherein the cut image comprises a face in the target face area.

And 105, sending the cut image to a server so as to identify the cut image through the server to determine the identity information of the target user.

In the embodiment of the application, after the first terminal obtains the cut image, the cut image is sent to the server, the server receives the cut image sent by the first terminal, then the server extracts the first face characteristic point in the cut image, compares the first face characteristic point with the face characteristic point stored in the server, determines the user identification corresponding to the target user, and then the server obtains the identity information of the target user corresponding to the user identification, so that the identity information of the target user is determined.

The first face feature point may be at least one feature point in a face, such as a nose, a left eye, a right eye, a mouth, and other feature points; the user identification is associated with the face characteristic points of the target user and the identity information of the target user, which are stored in the server, and is actually an index number, and the face characteristic points stored in the server and the identity information stored in the server are associated through the index number; the identity information of the target user comprises information of the name, age, sex, identification number, mobile phone number, occupation, academic calendar and the like of the target user.

When the camera gathers the video stream that contains target user's face information, just can directly determine target user's identity information based on the cooperation of first terminal and server is used, and the identity information confirms that required time is shorter, and the staff just can in time carry out the customization of marketing scheme to user's identity information, improves the efficiency of marketing, and promotes user's experience effect.

In addition, the purpose of reducing the calculation amount of the server and the calculation pressure of the server is to obtain a cut image by cutting the target image according to the target face area and send the cut image to the server to identify the identity information of the target user, rather than directly sending the target image to the server to identify the identity information. In addition, the extraction process of the target image and the cutting process of the cut image are deployed at the first terminal, the identification and comparison process of the first face characteristic point and the query process of the identity information are deployed at the server, the large bandwidth pressure caused by the fact that all functions are integrated in one device is avoided, and the bandwidth pressure during video streaming transmission can be effectively relieved by deploying part of the functions at the first terminal and deploying part of the functions at the server, namely the network bandwidth pressure is relieved.

In the embodiment of the application, at least one frame of first image to be recognized is extracted from a video stream, a target image is selected from the first image to be recognized, a first face area corresponding to the target image is determined as a target face area, the target image is cut according to the target face area, the cut image is sent to a server for face recognition, identity information of a target user is determined, when the collected video stream contains the face of the target user, the identity information of the target user can be known in time, a worker can customize a marketing scheme aiming at the identity information of the user in time, marketing efficiency is improved, and experience effects of the user are improved.

Referring to fig. 2, a specific flowchart of an identity recognition method according to an embodiment of the present application is shown, and is applied to a first terminal, where the method specifically includes the following steps:

step 201, extracting at least two first to-be-identified images from the video stream at intervals of preset frame numbers.

In the embodiment of the application, the camera collects the video stream in real time, and sends the collected video stream to the first terminal, the first terminal can extract one frame of image from the video stream at intervals of preset frame number to obtain a plurality of video frames, and then at least two video frames are selected from the plurality of video frames to be used as the first image to be identified, namely, the extraction of at least two first images to be identified from the video stream is realized.

Specifically, in the at least two frames of the first images to be recognized, any two adjacent frames of the first images to be recognized may be consecutive video frames, and the any two adjacent frames of the first images to be recognized are actually separated by a preset number of frames in the video stream.

For example, 3 frames of first to-be-recognized images are extracted from the video stream, which are respectively a first video frame, a second video frame and a third video frame in the video stream, the first video frame is a 1 st frame image in the video stream, the second video frame is an 11 th frame image in the video stream, the third video frame is a 21 st frame image in the video stream, in the 3 frames of first to-be-recognized images, the first video frame, the second video frame and the third video frame are continuous, no other extracted video frame exists therebetween, and two adjacent frames of first to-be-recognized images are actually separated by a preset frame number of 10 frames in the video stream.

Step 202, performing compression processing on the first image to be recognized of each frame, so that the size of the compressed first image to be recognized is smaller than that of the first image to be recognized before compression.

In this embodiment of the application, after the first terminal extracts at least two frames of the first to-be-recognized images from the video stream, the first to-be-recognized image of each frame is compressed, so that the size of the compressed first to-be-recognized image is smaller than that of the first to-be-recognized image before compression, and specifically, the first to-be-recognized image can be scaled down in an equal proportion.

For example, if the width and the height of the first to-be-recognized image before compression are W and H, respectively, the width and the height of the first to-be-recognized image after compression are W/3 and H/3, respectively.

Step 203, inputting the compressed first image to be recognized into the SSD model to obtain a first face region where a face in the first image to be recognized is located.

In the embodiment of the application, a pre-trained target detection model is stored in the first terminal, the target detection model is a neural network model, and the neural network model can be an SSD (Single Shot multi box Detector) model.

After compressing each frame of first image to be recognized, the first terminal may input the compressed each frame of first image to be recognized into the SSD model, and the SSD model may output position information corresponding to a first face region where a face of each frame of first image to be recognized is located, for example, output information such as a left-upper coordinate of a rectangular frame corresponding to the first face region, a width of the rectangular frame, and a height of the rectangular frame, and further output a category corresponding to the first face region.

By compressing the first image to be recognized and inputting the compressed first image to be recognized into the SSD model for detection, the detection speed can be shortened.

The SSD model includes a trunk network, a multi-scale detection sub-network and an NMS (non-maximum suppression) network, which are connected in sequence, where the trunk network may be a deep convolutional neural network, such as a VGG16 network, and the multi-scale detection sub-network includes a plurality of convolutional layers connected in sequence, and a classification network layer and a position regression network layer connected to each convolutional layer.

Inputting the compressed first image to be recognized into an SSD model, firstly, performing feature extraction on the first image to be recognized through a backbone network to obtain a feature image, and then inputting the feature image into a multilayer convolution layer to obtain class probability values (score) and position offsets (location) corresponding to preselected frames with different scales and different aspect ratios; then, inputting the category probability values, the position offsets and the preselected frames of all the preselected frames into a classification network layer and a position regression network layer for classification and regression treatment; and finally, inputting each pre-selection frame after classification and regression processing into an NMS network, removing redundant pre-selection frames through the NMS network, and selecting the pre-selection frame with the highest confidence coefficient as a rectangular frame corresponding to the first face area.

Step 204, for the at least two frames of first images to be recognized, respectively calculating the intersection and comparison between the first face area corresponding to the first image to be recognized in each frame and the first face area corresponding to the first image to be recognized in the adjacent previous frame.

In this embodiment of the application, for at least two frames of first images to be recognized, an Intersection-and-parallel ratio (IOU, interaction Over Union) between a first face region corresponding to each frame of first image to be recognized and a first face region corresponding to an adjacent previous frame of first image to be recognized is calculated respectively. The intersection ratio refers to the ratio of the area of the overlapped area in the two first face areas to the total area of the two first face areas.

For example, 3 frames of first images to be recognized are extracted from the video stream, which are the 1 st frame image in the video stream, the 11 th frame image in the video stream, and the 21 st frame image in the video stream, respectively, and the first face regions corresponding to the 3 frames of first images to be recognized are the first face region 1, the first face region 2, and the first face region 3, respectively, then the intersection ratio of the first face region 1 to the first face region 2, and the intersection ratio of the first face region 2 to the first face region 3 are calculated, respectively.

Step 205, when the intersection ratio corresponding to the at least two frames of first images to be recognized is greater than or equal to a first set threshold, selecting any one frame of the first images to be recognized from the at least two frames of first images to be recognized as a target image, and determining a first face region corresponding to the target image as the target face region.

In the embodiment of the application, when the intersection ratio corresponding to the at least two frames of first images to be recognized is greater than or equal to a first set threshold, it is determined that the first terminal does not mistakenly recognize the first face region where the face in the first images to be recognized is located, that is, the detected first face region where the face is located is accurate, at this time, any one frame of first image to be recognized is selected from the at least two frames of first images to be recognized as the target image, and the first face region corresponding to the target image is determined as the target face region. Usually, the first image to be recognized of the last frame is selected from the at least two first images to be recognized as the target image; the first set threshold may be set manually, for example, the first set threshold is 0.5.

For example, the intersection ratio of the first face region 1 to the first face region 2 is calculated to be 0.6, the intersection ratio of the first face region 2 to the first face region 3 is calculated to be 0.8, which are both greater than the first set threshold value 0.5, and the 3 first to-be-recognized images extracted from the video stream are respectively: and selecting the 21 st frame image in the video stream as a target image from the 1 st frame image in the video stream, the 11 th frame image in the video stream and the 21 st frame image in the video stream, and taking a first human face area 3 corresponding to the 21 st frame image in the video stream as a target human face area.

It should be noted that, when the cross-over ratio smaller than the first set threshold exists in the cross-over ratios corresponding to the at least two frames of first images to be recognized, it is determined that the first face region where the detected face is located is inaccurate, that is, the first terminal may mistake another object as a face, and therefore, the first face region that is wrong will not be continuously processed, the steps after extracting the first images to be recognized from the video stream are executed again.

And step 206, detecting a coordinate position corresponding to a first face key point in the target face area.

In the embodiment of the application, after determining a target image and a target face region corresponding to the target image, a first terminal detects a first face key point in the target face region through a face detection algorithm, and determines a coordinate position corresponding to the first face key point.

The first face key points include key points such as a left eye, a right eye, a nose, a left mouth corner, a right mouth corner and the like, and the coordinate positions corresponding to the first face key points include a coordinate position of the left eye in the target image, a coordinate position of the right eye in the target image, a coordinate position of the nose in the target image, a coordinate position of the left mouth corner in the target image, a coordinate position of the right mouth corner in the target image and the like.

By predetermining a target face region in a target image and then detecting only first face keypoints within the target face region, the amount of computation by the first terminal can be reduced.

And step 207, intercepting an area comprising the first face key point from the target image according to the coordinate position corresponding to the first face key point to obtain the cutting image.

In the embodiment of the application, after detecting the coordinate position corresponding to the first face key point in the target face region, the first terminal intercepts the region including the first face key point from the target image according to the coordinate position corresponding to the first face key point, and obtains the cut image.

Specifically, the target image is cut according to a preset cutting size, so that the size of the cut image is the preset cutting size, and the cut image contains all the first face key points.

And step 208, sending the cut image to a server so as to identify the cut image through the server to determine the identity information of the target user.

This step is similar to the step 105 of the first embodiment, and is not described herein again.

Further, the following steps are included after step 208: receiving a user identification of a target user sent by the server, and storing the user identification and the target face area corresponding to the user identification; sequentially extracting N frames of second images to be identified from the video stream; the second image to be identified is a video frame positioned after the first image to be identified in the video stream, and N is a positive integer greater than 1; detecting a second face area where the face in each frame of the second image to be recognized is located based on the target detection model; calculating the intersection ratio of the target face area and a second face area corresponding to each frame of the second image to be recognized; when a second face region with the intersection ratio with the target face region being greater than or equal to a second set threshold exists in a second face region corresponding to the continuously extracted N frames of second images to be recognized, executing the steps of sequentially extracting N frames of second images to be recognized from the video stream and the subsequent steps; and when the intersection ratio of a second face region corresponding to the continuously extracted N frames of second images to be recognized and the target face region is smaller than the second set threshold value, deleting the user identifier and the target face region, and re-executing the steps of extracting at least one frame of first images to be recognized from the video stream and the subsequent steps.

In the embodiment of the application, after the first terminal sends the cut image to the server, the server extracts a first face characteristic point in the cut image, compares the first face characteristic point with the face characteristic points stored in the server, determines a user identifier corresponding to a target user, and then sends the user identifier of the target user to the first terminal; and the first terminal receives the user identification of the target user sent by the server and caches the user identification and the target face area corresponding to the user identification.

The first terminal can acquire the video stream acquired by the camera in real time, and splits the video stream into a plurality of video frames according to requirements, and the video frames positioned behind the first image to be identified in the video stream are called as second images to be identified, so that N frames of second images to be identified can be sequentially extracted from the video stream.

For example, the first to-be-identified image includes a first video frame, a second video frame and a third video frame, and the first video frame is a 1 st frame image in the video stream, the second video frame is an 11 th frame image in the video stream, and the third video frame is a 21 st frame image in the video stream, so the video frames after the third video frame are referred to as the second to-be-identified image, for example, the fourth video frame, the fifth video frame and the sixth video frame are all the second to-be-identified image, and the fourth video frame is a 31 st frame image in the video stream, the fifth video frame is a 41 th frame image in the video stream, and the sixth video frame is a 51 st frame image in the video stream.

After the first terminal extracts the second image to be recognized, inputting each frame of the second image to be recognized into the target detection model to obtain a second face area where the face in each frame of the second image to be recognized is located. The target detection model may also be an SSD model, and a detection process of the second face region in the second image to be recognized is similar to a detection process of the first face region in the first image to be recognized, and is not repeated here.

And then, the first terminal calculates the intersection ratio of the cached target face area and a second face area corresponding to each frame of second image to be recognized.

When a second face area with the intersection ratio with the target face area being greater than or equal to a second set threshold value exists in a second face area corresponding to the continuously extracted N frames of second images to be recognized, determining that the target user corresponding to the target face area has no tracking missing, and then, continuously extracting the N frames of second images to be recognized from the video stream and the subsequent steps, namely, continuously extracting the second images to be recognized from the video stream, detecting the second face area where the face in the second images to be recognized is located, calculating the intersection ratio of the target face area and the second face area, and judging with the second set threshold value again.

And when the intersection ratio of a second face region corresponding to the continuously extracted N frames of second images to be recognized and the target face region is smaller than a second set threshold value, determining that the target user corresponding to the target face region is missed, deleting the cached user identification and the target face region at the moment, and re-executing the step S201 and the subsequent steps, namely re-extracting the first image to be recognized from the video stream, detecting the first face region where the face in the first image to be recognized is located, selecting the target image and the target face region from the first face region, then cutting the target image, and sending the cut image to a server to recognize the identity information of the target user.

Wherein, the second set threshold and the first set threshold may be equal or different; the number N of the frames of the second image to be recognized extracted continuously can be set manually according to actual conditions, for example, setting N to be 20 frames.

In the embodiment of the application, after determining the user identifier of the target user, the server sends the user identifier to the first terminal, the first terminal stores the user identifier and the target face area, the second image to be recognized extracted from the video stream is judged subsequently based on the target face area so as to realize the tracking of the target user, when the target user does not perform tracking missing, the first terminal does not continuously cut the second image to be identified, and does not send the cut image to the server to identify the identity information of the target user again, so that the situation that the calculation pressure of the server is greatly increased as all the cut images are sent to the server by the terminal is avoided, therefore, the embodiment of the application can reduce the calculation pressure of the terminal and the server in the tracking time of the target user, and correspondingly improve the identification speed of the identity information of each target user; and when the target user is subjected to tracking omission, the identity information of the target user is detected again through the first terminal and the server.

In the embodiment of the application, at least two frames of first images to be recognized are extracted from a video stream, a target image is selected from at least two frames of first images to be recognized under the condition that the intersection and comparison of a first face area corresponding to each frame of first images to be recognized and a first face area corresponding to an adjacent previous frame of first images to be recognized is larger than or equal to a first set threshold value, the first face area corresponding to the target image is determined as a target face area, then a coordinate position corresponding to a first face key point in the target face area is recognized, the target image is cut according to the coordinate position corresponding to the first face key point, the cut image is sent to a server for face recognition, identity information of a target user is determined, and when the acquired video stream contains the face of the target user, the identity information of the target user can be known in time, the staff can customize the marketing scheme in time according to the identity information of the user, so that the marketing efficiency is improved, and the experience effect of the user is improved; and only when the intersection ratio corresponding to the at least two first to-be-identified images is larger than or equal to a first set threshold value, the target image is selected from the at least two first to-be-identified images, so that the problem that the identity information of the target user cannot be detected subsequently due to the fact that the target detection model determines the areas where other objects are located as the first face area by mistake is avoided.

Referring to fig. 3, a flowchart of another identity recognition method according to an embodiment of the present application is shown, and is applied to a server, where the method specifically includes the following steps:

step 301, receiving a cut image sent by a first terminal; and the cutting image is obtained by cutting the target image according to the target face area by the first terminal.

In the embodiment of the application, a first terminal acquires a video stream acquired by a camera in real time, extracts at least one frame of first images to be recognized from the video stream, detects a first face area where a face of each frame of the first images to be recognized is located based on a target detection model, selects a target image from each frame of the first images to be recognized, determines the first face area corresponding to the target image as a target face area, cuts the target image according to the target face area to obtain a cut image, and finally sends the cut image to a server.

Correspondingly, the server receives a cutting image sent by the first terminal, wherein the cutting image is obtained by cutting the target image according to the target face area by the first terminal, and specifically, the cutting image comprises a first face key point in the target face area in the target image.

Step 302, extracting a first face feature point in the cut image.

In the embodiment of the application, after receiving the cut image sent by the first terminal, the server extracts a first face feature point in the cut image, where the first face feature point may be at least one feature point in a face, such as a nose, a left eye, a right eye, and a mouth.

Step 303, comparing the first face feature point with the face feature points stored in the server, and determining a user identifier corresponding to the target user.

In the embodiment of the application, a face feature library and a user identity information database are arranged in the server, the face feature library stores face feature points of each user, the user identity information database stores identity information of each user, user identifications of each user are stored in the server, and the user identifications correspond to the face feature points of the users stored in the face feature library and the identity information of the users stored in the user identity information database one by one.

After extracting a first face characteristic point in the cut image, the server compares the first face characteristic point with each personal face characteristic point stored in a face characteristic library, when the similarity between the first face characteristic point and a target face characteristic point stored in the face characteristic library is larger than a similarity threshold, the first face characteristic point is determined to be matched with the target face characteristic point, then, a user identification corresponding to the target face characteristic point is inquired, and the user identification is the user identification of a target user corresponding to the cut image.

And 304, acquiring the identity information of the target user corresponding to the user identifier.

In the embodiment of the application, after the server obtains the user identifier corresponding to the target user, since the user identifier corresponds to the face feature points of the user stored in the face feature library and the identity information of the user stored in the user identity information database one to one, the identity information of the target user can be queried from the user identity information database according to the user identifier of the target user.

The identity information of the target user comprises information of the name, age, sex, identification number, mobile phone number, occupation, academic calendar and the like of the target user.

And, the server may count the visit time and the visit number of the target user based on the identification time and the identification number of the identity information of each target user.

Further, after step 304, the method further includes: and sending the identity information of the target user to a second terminal so as to remind the target user of visiting through the second terminal.

In the embodiment of the application, after the server identifies the identity information of the target user, the identity information of the target user is sent to the second terminal, the second terminal receives the identity information of the target user sent by the server, the identity information is displayed on a display screen of the second terminal to remind relevant staff of visiting the target user, the staff can check the identity information of the target user through the second terminal, a marketing scheme is made for the target user in time, and marketing efficiency is improved.

The second terminal may be a display screen deployed in a corresponding occasion, such as a display screen deployed in a bank business hall, or a terminal device designated by a related worker, such as a mobile phone, a computer, and other terminal devices held by a product manager.

Further, after step 303, the method further includes: sending the user identification of the target user to the first terminal, so as to calculate the intersection ratio of a second face area corresponding to the continuously extracted N frames of second images to be recognized and the target face area through the first terminal, thereby realizing the tracking of the target user; and N is a positive integer greater than 1.

In the embodiment of the application, after the server determines the user identifier corresponding to the target user, the server sends the user identifier of the target user to the first terminal, and the first terminal receives the user identifier of the target user sent by the server and caches the user identifier and the target face area corresponding to the user identifier.

Then, the first terminal sequentially extracts N frames of second images to be recognized from the video stream, the second images to be recognized are video frames behind the first images to be recognized in the video stream, then, on the basis of a target detection model, a second face area where a face in each frame of second images to be recognized is located is detected, and the intersection and parallel ratio of the target face area and the second face area corresponding to each frame of second images to be recognized is calculated.

And judging whether the target user is missed to track or not according to the intersection ratio of the second face area corresponding to the N frames of second images to be recognized and the target face area. Specifically, when a second face region with the intersection ratio with the target face region being greater than or equal to a second set threshold exists in a second face region corresponding to the continuously extracted N frames of second images to be recognized, determining that no tracking missing occurs to a target user corresponding to the target face region; and when the intersection ratio of a second face region corresponding to the continuously extracted N frames of second images to be recognized and the target face region is smaller than a second set threshold value, determining that the target user corresponding to the target face region is subjected to tracking omission.

And under the condition that the target user does not miss tracking, sequentially extracting N frames of second images to be recognized from the video stream and the subsequent steps are continuously executed, and under the condition that the target user misses tracking, deleting the cached user identification and the target face area, and re-extracting the first images to be recognized from the video stream and the subsequent steps.

The target users are tracked through the first terminal, the calculation pressure of the terminal and the server can be reduced, and the identification speed of the identity information of each target user is correspondingly improved.

Referring to fig. 4, a flowchart illustrating a registration process of registering a face feature point and identity information of a user in an embodiment of the present application is shown, which may specifically include the following steps:

step 401, receiving a face image and identity information of a registered user sent by a third terminal.

In the embodiment of the application, the registered user can input the face image and the identity information of the registered user on the third terminal, then the third terminal sends the face image and the identity information of the registered user to the server, and the server receives the face image and the identity information of the registered user sent by the third terminal.

The face image sent by the third terminal can be collected by the third terminal in real time or can be pre-stored on the third terminal; in addition, the third terminal may be a terminal device deployed in a corresponding occasion, such as a terminal that is deployed in a bank business hall and can collect a face image, or a terminal device that is held by a registered user, such as a mobile phone that is held by the registered user.

Step 402, detecting a third face area where a face in the face image is located based on a target detection model.

In the embodiment of the present application, a pre-trained target detection model is also stored in the server, and the target detection model may be an SSD model.

After receiving the face image sent by the third terminal and the identity information of the registered user, the server inputs the face image into the target detection model, and the target detection model outputs a third face area where the face in the face image is located.

The third face region is also a rectangular frame region in practice, and the detected third face region is also represented by position information of the third face region.

And step 403, extracting second face characteristic points in the third face region.

In the embodiment of the application, after detecting a third face region where a face in a face image is located, the server extracts a second face feature point in the third face region, where the second face feature point may be at least one feature point in the face image, such as a nose, a left eye, a right eye, a mouth, and other feature points.

Step 404, comparing the second facial feature point with the facial feature points stored in the server, and determining whether the second facial feature point is stored in the server.

In the embodiment of the application, after the server extracts the second face feature point in the third face area, the server compares the second face feature point with the face feature points stored in the server, that is, determines whether the second face feature point is matched with any one of the face feature points stored in the server, thereby determining whether the second face feature point is stored in the server, that is, determining whether the registered user is already registered.

Step 405, when the second face feature point is stored in the server, returning registration error information to the third terminal.

In the embodiment of the application, when the second facial feature point is matched with one of the facial feature points stored in the server, it is determined that the second facial feature point is stored in the server, and accordingly, it is determined that the registered user has been registered before, at this time, the server returns registration error information to the third terminal to remind the registered user that the registered user has been registered before.

In the actual use process, two users with similar growth may exist, and when it is determined that the second face feature point is stored in the server, registration error information, identity information of the registered user and identity information corresponding to the face feature point matched with the second face feature point in the server may also be sent to the terminal device corresponding to the staff, so as to remind the staff to perform timely processing, for example, to manually register the registered user.

Step 406, when the second facial feature point is not stored in the server, saving the second facial feature point and the identity information of the registered user, and generating a user identifier associated with the second facial feature point and the identity information of the registered user respectively.

In the embodiment of the application, when the second facial feature point does not match with each personal facial feature point stored in the server, it is determined that the second facial feature point is not stored in the server, and then the server stores the second facial feature point and the identity information of the registered user.

It should be noted that the first terminal, the second terminal and the third terminal are not the same terminal, and the first terminal is actually a development board fixed at the back end of the camera, such as an rk3399 development board.

Further, after step 402, the method further includes: determining whether the face in the third face region is occluded based on a face occlusion model; when the face in the third face region is not shielded, detecting a coordinate position corresponding to a second face key point in the third face region; determining whether the face in the third face area is a front face or not according to the coordinate position corresponding to the second face key point; and when the face in the third face region is a front face, executing the steps of extracting the second face characteristic points in the third face region and the later steps.

In the embodiment of the application, the server stores a trained face shielding model in advance, and the face shielding model is a neural network model. After detecting a third face area where the face in the face image is located, the server inputs the face image into the face shielding model, the face shielding model outputs a corresponding result, and the result represents whether the face in the third face area in the face image is shielded or not.

When the face in the third face region is not occluded, the server detects a second face key point in the third face region, and determines a coordinate position corresponding to the second face key point, wherein the second face key point comprises key points such as a left eye, a right eye, a nose, a left mouth corner, a right mouth corner and the like, and the coordinate position corresponding to the second face key point comprises a coordinate position of the left eye in the face image, a coordinate position of the right eye in the face image, a coordinate position of the nose in the face image, a coordinate position of the left mouth corner in the face image, a coordinate position of the right mouth corner in the face image and the like.

And then, the server determines whether the face in the third face area is a front face or not according to the coordinate position corresponding to the second face key point. Specifically, whether the face in the third face region is a front face is judged according to the distance between the coordinate positions of any two second face key points, when the distance between the coordinate positions of any two second face key points is within the corresponding preset distance range, the face in the third face region is determined to be the front face, and if the distance between the coordinate positions of any two second face key points is beyond the corresponding preset distance range, the face in the third face region is determined not to be the front face.

For example, if the distance between the left eye and the right eye is L1, and the corresponding preset distance range between the left eye and the right eye is [ L2, L3], and the distance between the left eye and the right eye L1 is not within the preset distance range [ L2, L3], it is determined that the face in the third face region is not a frontal face.

When the server determines that the face in the third face region is a positive face, the server performs the steps of extracting the second face feature points in the third face region and then performs steps 403 to 406.

However, when the face in the third face area is blocked or the face in the third face area is not a front face, it indicates that the face image sent by the third terminal is not satisfactory, and the whole execution process is finished, and the server does not perform registration according to the face image sent by the third terminal and the identity information of the registered user.

In the embodiment of the application, through extracting the first face characteristic point in the cut image, the first face characteristic point is compared with the face characteristic points stored in the server, the user identification corresponding to the target user is determined, then the identity information of the target user corresponding to the user identification is obtained, when the collected video stream contains the face of the target user, the cut image also correspondingly contains the face of the target user, the identity information of the target user can be timely known based on the cut image, the staff can timely customize the marketing scheme aiming at the identity information of the user, the marketing efficiency is improved, and the experience effect of the user is improved.

Referring to fig. 5, a block diagram of a terminal according to an embodiment of the present application is shown.

The terminal 500 provided in the embodiment of the present application is a first terminal, and the terminal 500 includes:

a first to-be-recognized image extraction module 501 configured to extract at least one frame of a first to-be-recognized image from a video stream;

a first face region detection module 502 configured to detect a first face region where a face in each frame of the first image to be recognized is located based on a target detection model;

a target face region determining module 503, configured to select a target image from the first to-be-recognized images of each frame, and determine a first face region corresponding to the target image as a target face region;

a target image clipping module 504, configured to clip the target image according to the target face region, so as to obtain a clipped image;

a cut image sending module 505 configured to send the cut image to a server, so as to identify the cut image through the server, and determine the identity information of the target user.

Optionally, the first module 501 for extracting an image to be recognized includes:

the first to-be-identified image extraction sub-module is configured to extract at least two frames of first to-be-identified images from the video stream at intervals of preset frames;

the target face region determining module 503 includes:

the merging ratio calculation sub-module is configured to calculate, for the at least two frames of first images to be recognized, a merging ratio of a first face region corresponding to the first image to be recognized of each frame and a first face region corresponding to the first image to be recognized of an adjacent previous frame respectively;

and the target face area determining submodule is configured to select any one frame of the first to-be-recognized image from the at least two frames of the first to-be-recognized images as a target image and determine a first face area corresponding to the target image as the target face area when the intersection ratio corresponding to the at least two frames of the first to-be-recognized images is larger than or equal to a first set threshold value.

Optionally, the target image cropping module 504 includes:

a coordinate position detection submodule configured to detect a coordinate position corresponding to a first face key point in the target face region;

and the target image cutting submodule is configured to intercept an area including the first face key point from the target image according to the coordinate position corresponding to the first face key point to obtain the cut image.

Optionally, the terminal 500 further includes:

the user identification receiving module is configured to receive a user identification of a target user sent by the server and store the user identification and the target face area corresponding to the user identification;

the second image to be identified extraction module is configured to sequentially extract N frames of second images to be identified from the video stream; the second image to be identified is a video frame positioned after the first image to be identified in the video stream, and N is a positive integer greater than 1;

the second face area detection module is configured to detect a second face area where a face in each frame of the second image to be recognized is located based on the target detection model;

the intersection ratio calculation module is configured to calculate an intersection ratio of the target face area and a second face area corresponding to each frame of the second image to be recognized;

when a second face region with the intersection ratio with the target face region being greater than or equal to a second set threshold exists in a second face region corresponding to the continuously extracted N frames of second images to be recognized, executing a second image extraction module to be recognized;

and the user identifier deleting module is configured to delete the user identifier and the target face area when the intersection ratio of the second face area corresponding to the continuously extracted N frames of the second image to be recognized and the target face area is smaller than the second set threshold, and execute the first image to be recognized extracting module 501 again.

Optionally, the first face area detecting module 502 includes:

a first to-be-recognized image compression sub-module configured to perform compression processing on the first to-be-recognized image of each frame so that the size of the compressed first to-be-recognized image is smaller than the size of the first to-be-recognized image before compression;

and the first face region detection sub-module is configured to input each compressed frame of the first image to be recognized into the SSD model to obtain a first face region where a face in each frame of the first image to be recognized is located.

Referring to fig. 6, a block diagram of a server according to an embodiment of the present application is shown.

The server 600 provided in the embodiment of the present application includes:

a cut image receiving module 601 configured to receive a cut image transmitted by a first terminal; the cutting image is obtained by cutting a target image according to a target face area by the first terminal;

a first face feature point extraction module 602 configured to extract a first face feature point in the cropped image;

the user identifier determining module 603 is configured to compare the first face feature point with the face feature points stored in the server, and determine a user identifier corresponding to a target user;

an identity information obtaining module 604, configured to obtain identity information of a target user corresponding to the user identifier.

Optionally, the server 600 further includes:

and the identity information sending module is configured to send the identity information of the target user to a second terminal so as to remind the target user of visiting through the second terminal.

Optionally, the server 600 further includes:

the user identification sending module is configured to send the user identification of the target user to the first terminal so as to calculate the intersection ratio of a second face area corresponding to the continuously extracted N frames of second images to be recognized and the target face area through the first terminal, and therefore the tracking of the target user is achieved; and N is a positive integer greater than 1.

Optionally, the server 600 further includes:

the face image receiving module is configured to receive a face image sent by a third terminal and identity information of a registered user;

the third face area detection module is configured to detect a third face area where a face in the face image is located based on the target detection model;

a second face feature point extraction module configured to extract second face feature points within the third face region;

the face characteristic point comparison module is configured to compare the second face characteristic point with the face characteristic points stored in the server and determine whether the second face characteristic point is stored in the server;

a registration error information returning module configured to return registration error information to the third terminal when the second face feature point is stored in the server;

a second face feature point saving module configured to save the second face feature point and the identity information of the registered user when the second face feature point is not stored in the server, and generate a user identifier associated with the second face feature point and the identity information of the registered user, respectively.

Optionally, the server 600 further includes:

an occlusion determination module configured to determine whether a face in the third face region is occluded based on a face occlusion model;

the coordinate position detection module is configured to detect a coordinate position corresponding to a second face key point in the third face region when the face in the third face region is not shielded;

the face judging module is configured to determine whether the face in the third face area is a face according to the coordinate position corresponding to the second face key point;

and when the face in the third face region is a front face, executing the second face characteristic point extraction module.

In the embodiment of the application, the first face characteristic points in the cut image are extracted, the first face characteristic points are compared with the face characteristic points stored in the server, the user identification corresponding to the target user is determined, then the identity information of the target user corresponding to the user identification is obtained, when the collected video stream contains the face of the target user, the cut image also correspondingly contains the face of the target user, the identity information of the target user can be known in time based on the cut image, a worker can customize a marketing scheme in time according to the identity information of the user, marketing efficiency is improved, and the experience effect of the user is improved.

Correspondingly, the embodiment of the present application further provides a terminal, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when the computer program is executed by the processor, the steps of the above-mentioned terminal-side identity recognition method are implemented.

An embodiment of the present application further provides a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned identity recognition method on the terminal side are implemented.

Correspondingly, the embodiment of the present application further provides a server, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when the computer program is executed by the processor, the steps of the server-side identity recognition method are implemented.

The embodiment of the present application further provides a computer readable medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps of the server-side identity identification method.

Referring to fig. 7, a block diagram of an identification system according to an embodiment of the present application is shown.

The embodiment of the present application further provides an identity recognition system, which includes a camera 701, a second terminal 702, a third terminal 703, the first terminal 500 and the server 600.

The camera 701 is configured to acquire a video stream and send the video stream to the first terminal 500; the second terminal 702 is configured to receive the identity information of the target user sent by the server 600, so as to perform a visit reminder of the target user; and a third terminal 703 configured to transmit the face image and the identity information of the registered user to the server 600, and receive registration error information returned by the server 600 when the second face characteristic point included in the face image is stored in the server 600.

In this embodiment of the application, the camera 701 sends a video stream to the first terminal 500, the first terminal extracts a video frame from the video stream, and selects a first image to be recognized and a second image to be recognized from the video frame, where a video frame after the first image to be recognized in the video stream is the second image to be recognized.

The first terminal 500 inputs the first image to be recognized into the target detection model to obtain a first face region where the face of each frame of the first image to be recognized is located, selects a target image from each frame of the first image to be recognized, and determines the first face region corresponding to the target image as a target face region; then, detecting a first face key point in the target face area, determining the coordinate position of the first face key point, and intercepting an area including the first face key point from the target image according to the coordinate position corresponding to the first face key point to obtain a cut image.

The first terminal 500 sends the cut image to the server 600, the server 600 extracts a first face feature point in the cut image, compares the first face feature point with a face feature point stored in a face feature library, and determines a user identifier corresponding to a target user; the server 600 queries the identity information of the target user from the user identity information database according to the user identifier, and sends the identity information of the target user to the second terminal 702, so as to remind the target user of visiting through the second terminal 702.

Moreover, the server 600 may also send a user identifier corresponding to the target user to the first terminal 500, the first terminal 500 caches the user identifier and a target face region corresponding to the user identifier, the first terminal 500 may also input the second image to be recognized into the target detection model to obtain a second face region where the face in the second image to be recognized is located, and then the first terminal 500 calculates the intersection ratio between the second face region corresponding to the N frames of the second image to be recognized, which are continuously extracted, and the target face region, thereby implementing the tracking of the target user.

In addition, the registered user may also send the face image and the identity information of the registered user to the server 600 through the third terminal 703 to perform user information registration, and when the registration is successful, the server 600 stores the second face feature point in the face image into the face feature library and stores the identity information of the registered user into the user identity information database.

In practical use, the camera can be deployed in a bank business hall or other occasions where identification information of target users is required.

In the embodiment of the application, at least one frame of first image to be recognized is extracted from a video stream, a target image is selected from the first image, a first face area corresponding to the target image is determined as a target face area, the target image is cut according to the target face area, the cut image is sent to a server, the server extracts a first face characteristic point in the cut image, the first face characteristic point is compared with the face characteristic point stored in the server, a user identification corresponding to a target user is determined, identity information of the target user corresponding to the user identification is obtained, when the collected video stream contains the face of the target user, the cut image correspondingly contains the face of the target user, the identity information of the target user can be known in time based on the cut image, and a worker can customize a marketing scheme in time according to the identity information of the user, efficiency of marketing is improved, and user's experience effect is promoted.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in a computing processing device according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 8 illustrates a computing processing device, such as the aforementioned server 600 or terminal 500, that may implement methods in accordance with the present application. The computing processing device conventionally includes a processor 810 and a computer program product or computer-readable medium in the form of a memory 820. The memory 820 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 820 has a storage space 830 for program code 831 for performing any of the method steps described above. For example, the storage space 830 for the program code may include respective program codes 831 for implementing various steps in the above method, respectively. The program code can be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a portable or fixed storage unit as described with reference to fig. 9. The storage unit may have memory segments, memory spaces, etc. arranged similarly to the memory 820 in the computing processing device of fig. 8. The program code may be compressed, for example, in a suitable form. Typically, the memory unit comprises computer readable code 831', i.e. code that can be read by a processor, such as 810, for example, which when executed by a computing processing device causes the computing processing device to perform the steps of the method described above.

Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Moreover, it is noted that instances of the word "in one embodiment" are not necessarily all referring to the same embodiment.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

An identity recognition method, applied to a first terminal, the method comprising:

extracting at least one frame of first image to be identified from the video stream;

detecting a first face area where a face in each frame of the first image to be recognized is located based on a target detection model;

selecting a target image from the first to-be-recognized image of each frame, and determining a first face area corresponding to the target image as a target face area;

according to the target face area, cutting the target image to obtain a cut image;

and sending the cut image to a server so as to identify the cut image through the server to determine the identity information of the target user.
The method according to claim 1, wherein the step of extracting at least one first to-be-identified image from the video stream comprises:

extracting at least two first to-be-identified images from the video stream every preset frame number;

the step of selecting a target image from the first to-be-recognized images of each frame and determining a first face region corresponding to the target image as a target face region includes:

respectively calculating the intersection and parallel ratio of a first face area corresponding to the first to-be-recognized image of each frame and a first face area corresponding to the first to-be-recognized image of the adjacent previous frame aiming at the at least two frames of first to-be-recognized images;

when the intersection ratio corresponding to the at least two frames of first images to be recognized is larger than or equal to a first set threshold, selecting any one frame of the first images to be recognized from the at least two frames of first images to be recognized as a target image, and determining a first face area corresponding to the target image as the target face area.
The method according to claim 1, wherein the step of cropping the target image according to the target face region to obtain a cropped image comprises:

detecting a coordinate position corresponding to a first face key point in the target face area;

and intercepting an area including the first face key point from the target image according to the coordinate position corresponding to the first face key point to obtain the cutting image.
The method of claim 1, further comprising, after the step of sending the cropped image to a server for recognition of the cropped image by the server to determine identity information of a target user:

receiving a user identification of a target user sent by the server, and storing the user identification and the target face area corresponding to the user identification;

sequentially extracting N frames of second images to be identified from the video stream; the second image to be identified is a video frame positioned after the first image to be identified in the video stream, and N is a positive integer greater than 1;

detecting a second face area where the face in each frame of the second image to be recognized is located based on the target detection model;

calculating the intersection ratio of the target face area and a second face area corresponding to each frame of the second image to be recognized;

when a second face region with the intersection ratio with the target face region being greater than or equal to a second set threshold exists in a second face region corresponding to the continuously extracted N frames of second images to be recognized, executing the steps of sequentially extracting N frames of second images to be recognized from the video stream and the subsequent steps;

and when the intersection ratio of a second face region corresponding to the continuously extracted N frames of second images to be recognized and the target face region is smaller than the second set threshold value, deleting the user identifier and the target face region, and re-executing the steps of extracting at least one frame of first images to be recognized from the video stream and the subsequent steps.
The method according to claim 1, wherein the step of detecting, based on the target detection model, a first face region in which a face is located in each frame of the first image to be recognized includes:

compressing the first image to be recognized of each frame, so that the size of the compressed first image to be recognized is smaller than that of the first image to be recognized before compression;

and inputting the compressed first image to be recognized into an SSD model to obtain a first face area where the face of each frame of the first image to be recognized is located.
An identity recognition method is applied to a server, and the method comprises the following steps:

receiving a cutting image sent by a first terminal; the cutting image is obtained by cutting a target image according to a target face area by the first terminal;

extracting a first face characteristic point in the cut image;

comparing the first face characteristic point with the face characteristic points stored in the server, and determining a user identifier corresponding to a target user;

and acquiring the identity information of the target user corresponding to the user identification.
The method according to claim 6, further comprising, after the step of obtaining the identity information of the target user corresponding to the user identifier:

and sending the identity information of the target user to a second terminal so as to remind the target user of visiting through the second terminal.
The method of claim 6, further comprising, after the step of comparing the first facial feature points with facial feature points stored in the server to determine a user identifier corresponding to a target user:

sending the user identification of the target user to the first terminal, so as to calculate the intersection ratio of a second face area corresponding to the continuously extracted N frames of second images to be recognized and the target face area through the first terminal, thereby realizing the tracking of the target user; and N is a positive integer greater than 1.
The method of claim 6, further comprising, before the step of comparing the first face feature point with the face feature points stored in the server to determine the user identifier corresponding to the target user:

receiving a face image and identity information of a registered user sent by a third terminal;

detecting a third face area where a face in the face image is located based on a target detection model;

extracting a second face characteristic point in the third face region;

comparing the second face characteristic point with the face characteristic points stored in the server, and determining whether the second face characteristic point is stored in the server;

when the second face characteristic point is stored in the server, returning registration error information to the third terminal;

when the second face characteristic point is not stored in the server, the second face characteristic point and the identity information of the registered user are saved, and user identifications respectively associated with the second face characteristic point and the identity information of the registered user are generated.
The method according to claim 9, further comprising, after the step of detecting a third face region where the face in the face image is located based on the target detection model, the steps of:

determining whether the face in the third face region is occluded based on a face occlusion model;

when the face in the third face region is not shielded, detecting a coordinate position corresponding to a second face key point in the third face region;

determining whether the face in the third face area is a front face or not according to the coordinate position corresponding to the second face key point;

and when the face in the third face region is a front face, executing the steps of extracting the second face characteristic points in the third face region and the subsequent steps.
A terminal, characterized in that the terminal is a first terminal, the first terminal comprising:

the first to-be-recognized image extraction module is configured to extract at least one frame of first to-be-recognized image from the video stream;

the first face region detection module is configured to detect a first face region where a face in each frame of the first image to be recognized is located based on a target detection model;

the target face area determining module is configured to select a target image from the first to-be-recognized images of each frame and determine a first face area corresponding to the target image as a target face area;

the target image cutting module is configured to cut the target image according to the target face area to obtain a cut image;

the cutting image sending module is configured to send the cutting image to a server so as to identify the cutting image through the server to determine the identity information of the target user.
A server, comprising:

the cutting image receiving module is configured to receive a cutting image sent by a first terminal; the cutting image is obtained by cutting a target image according to a target face area by the first terminal;

a first face feature point extraction module configured to extract a first face feature point in the cropped image;

the user identification determining module is configured to compare the first face characteristic points with the face characteristic points stored in the server and determine a user identification corresponding to a target user;

and the identity information acquisition module is configured to acquire the identity information of the target user corresponding to the user identifier.
A terminal, characterized in that it comprises a processor, a memory and a computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of the identification method according to any one of claims 1 to 5.
A computer-readable medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the identification method according to one of claims 1 to 5.
A server, characterized in that it comprises a processor, a memory and a computer program stored on said memory and executable on said processor, said computer program, when executed by said processor, implementing the steps of the identification method according to any one of claims 6 to 10.
A computer-readable medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the identification method according to one of claims 6 to 10.
An identification system comprising a camera, a second terminal, a third terminal, a first terminal as claimed in claim 11 and a server as claimed in claim 12;

the camera is configured to acquire a video stream and send the video stream to the first terminal;

the second terminal is configured to receive identity information of a target user sent by the server so as to remind the target user of visiting;

the third terminal is configured to send the face image and the identity information of the registered user to the server, and receive registration error information returned by the server when a second face characteristic point contained in the face image is stored in the server.