WO2022140879A1

WO2022140879A1 - Identity recognition method, terminal, server, and system

Info

Publication number: WO2022140879A1
Application number: PCT/CN2020/139827
Authority: WO
Inventors: 许景涛
Original assignee: 京东方科技集团股份有限公司
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2022-07-07
Also published as: CN115066712A

Abstract

The present application relates to the technical field of computers, and provides an identity recognition method, a terminal, a server and a system. According to the present application, at least one frame of a first image to be recognized is extracted from a video stream, and a target image is selected from said image; a first face region corresponding to the target image is determined as a target face region; then, according to the target face region, the target image is cropped; and the cropped image is sent to a server for face recognition to determine identity information of a target user. When a collected video stream comprises the face of a target user, identity information of the target user can be known in a timely manner, and thus, a clerk can customize a marketing plan in a timely manner according to the identity information of the user, thereby increasing marketing efficiency and improving user experience.

Description

An identification method, terminal, server and system

technical field

The present application relates to the field of computer technology, and in particular, to an identification method, terminal, server and system.

Background technique

In people's daily life and work, it is often necessary to go to a bank business hall and other occasions to handle related business.

At present, before the business is processed, the user needs to extract the corresponding number from the automatic number dispenser, and queue up to the corresponding counter to handle the business in the order of the number. When the user handles the related business at the counter, the relevant identity documents will be provided. Only the staff of , can know the identity information of the user. For example, for VIP users of a bank, the identity information of the VIP users can only be determined when the VIP users go to the counter to conduct business.

However, at present, the way that users queue up and then go to the counter to handle the business, the staff cannot know the user's identity information in time. Therefore, it is impossible to customize the marketing plan for the user in time according to the user's identity information, resulting in a poor user experience. Difference.

Overview

Some embodiments of the present application provide the following technical solutions:

In a first aspect, an identification method is provided, which is applied to a first terminal, and the method includes:

extracting at least one frame of the first image to be identified from the video stream;

Based on the target detection model, detect the first face region where the face in the first to-be-recognized image of each frame is located;

Select a target image from the first to-be-recognized images of each frame, and determine the first face region corresponding to the target image as the target face region;

According to the target face region, the target image is cropped to obtain a cropped image;

Sending the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.

In a second aspect, an identification method is provided, applied to a server, the method includes:

receiving the cropped image sent by the first terminal; the cropped image is obtained by the first terminal cropping the target image according to the target face region;

extracting the first face feature point in the cropped image;

Comparing the first facial feature point with the facial feature point stored in the server to determine the user identifier corresponding to the target user;

Acquire the identity information of the target user corresponding to the user identifier.

A third aspect provides a terminal, where the terminal is a first terminal, and the first terminal includes:

a first to-be-recognized image extraction module, configured to extract at least one frame of the first to-be-recognized image from the video stream;

a first face region detection module, configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model;

a target face area determination module, configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;

a target image cropping module, configured to crop the target image according to the target face region to obtain a cropped image;

The cropped image sending module is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.

In a fourth aspect, a server is provided, including:

a cropped image receiving module, configured to receive a cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region;

a first face feature point extraction module, configured to extract the first face feature point in the cropped image;

a user identification determining module, configured to compare the first facial feature points with the facial feature points stored in the server, and determine the user identification corresponding to the target user;

The identity information acquisition module is configured to acquire the identity information of the target user corresponding to the user identification.

In a fifth aspect, a terminal is provided, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing the above-mentioned identity when executed by the processor Identify the steps of the method.

In a sixth aspect, a computer-readable medium is provided, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned identification method are implemented.

In a seventh aspect, a server is provided, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing the above-mentioned identity when executed by the processor Identify the steps of the method.

In an eighth aspect, a computer-readable medium is provided, and a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned identification method are implemented.

In a ninth aspect, an identification system is provided, including a camera, a second terminal, a third terminal, the above-mentioned first terminal and the above-mentioned server;

Wherein, the camera is configured to capture a video stream and send the video stream to the first terminal;

The second terminal is configured to receive the identity information of the target user sent by the server, so as to remind the target user of the visit;

The third terminal is configured to send the face image and the identity information of the registered user to the server, and when the second face feature point included in the face image is stored in the server, receive the The registration error message returned by the server.

The above description is only an overview of the technical solution of the present application, in order to be able to understand the technical means of the present application more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present application more obvious and easy to understand , and the specific embodiments of the present application are listed below.

Description of drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

FIG. 1 schematically shows a flowchart of an identity recognition method according to an embodiment of the present application;

FIG. 2 schematically shows a specific flowchart of an identity recognition method according to an embodiment of the present application;

FIG. 3 schematically shows a flowchart of another identity identification method according to an embodiment of the present application;

FIG. 4 schematically shows a flowchart of a registration process of facial feature points and identity information of registered users in the embodiment of the present application;

FIG. 5 schematically shows a structural block diagram of a terminal according to an embodiment of the present application;

FIG. 6 schematically shows a structural block diagram of a server according to an embodiment of the present application;

FIG. 7 schematically shows a structural diagram of an identity recognition system according to an embodiment of the present application;

Figure 8 schematically shows a block diagram of a computing processing device for performing the method according to the present application; and

Figure 9 schematically shows a memory unit for holding or carrying program code implementing the method according to the application.

specific embodiment

In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.

Referring to FIG. 1, a flowchart of an identity recognition method according to an embodiment of the present application is shown, which is applied to a first terminal and may specifically include the following steps:

Step 101: Extract at least one frame of a first image to be recognized from a video stream.

In the embodiment of the present application, the camera collects the video stream in real time, and sends the collected video stream to the first terminal, then the first terminal reads the video stream collected by the camera in real time in a multi-threaded manner, and extracts the video stream from the Extract video frames from the stream.

Specifically, each frame of image in the video stream can be extracted to obtain multiple video frames, or one frame of image can be extracted from the video stream every preset number of frames to obtain multiple video frames, such as every 10 frames. Frame extracts a frame of image from the video stream or extracts a frame of image from the video stream every 5 frames, etc.

Next, at least one frame of the first image to be identified is selected from the obtained multiple video frames. Specifically, only one video frame may be selected from the multiple video frames as the first image to be identified. At this time, from the video stream The number of frames of the first to-be-recognized image extracted is one frame; at least two video frames may also be selected from multiple video frames as the first to-be-recognized image, at this time, the first to-be-recognized image extracted from the video stream is The number of frames is at least two frames.

For example, if the duration of the video stream is 1s, the number of frames of images contained in the video stream per second is 60 frames, and one frame of image is extracted from the video stream every 10 frames, then 6 frames can be extracted from the video stream. video frames, the first video frame is the 1st frame image in the video stream, the second video frame is the 11th frame image in the video stream, the third video frame is the 21st frame image in the video stream, the th The four video frames are the 31st frame image in the video stream, the fifth video frame is the 41st frame image in the video stream, and the sixth video frame is the 51st frame image in the video stream. In addition, the first video frame can be selected as the first image to be recognized, and at this time, the number of frames of the first image to be recognized is one frame; alternatively, the first video frame, the second video frame and the third video frame can be selected. The video frame is used as the first image to be recognized, and at this time, the number of frames of the first image to be recognized is 3 frames.

Step 102 , based on the target detection model, detect the first face region where the face in the first image to be recognized is located in each frame.

In the embodiment of the present application, a pre-trained target detection model is stored in the first terminal, and the target detection model is a neural network model. The target detection model is obtained by training multiple sample images and human-annotated face regions in each sample image.

After extracting at least one frame of the first to-be-recognized image from the video stream, the first terminal inputs each frame of the first to-be-recognized image into the target detection model, and obtains the first location where the face in each frame of the first to-be-recognized image is located. A person's face area.

In fact, the first face area is a rectangular frame area, and the detected first face area is represented by the position information of the first face area, such as the coordinates of the upper left corner of the rectangular frame, the width of the rectangular frame and the The height of the rectangular box represents the first face area.

Step 103: Select a target image from the first images to be identified in each frame, and determine the first face region corresponding to the target image as the target face region.

In the embodiment of the present application, after detecting the first face region where the face in each frame of the first image to be recognized is located, the first terminal selects a target image from the first image to be recognized in each frame, and assigns the target image to the target image. The first face region corresponding to the image is determined as the target face region.

When the number of frames of the first image to be recognized is one frame, the first image to be recognized is also the target image, and the first face area of the first image to be recognized is also the target face area of the target image; When the number of frames of an image to be recognized is at least two frames, according to the relationship between the first face regions, one frame of the first image to be recognized is selected from the at least two frames of the first image to be recognized as the target image, and The first face region corresponding to the target image is also the target face region of the target image.

For example, the number of frames of the first image to be recognized is 3 frames, which are the first video frame, the second video frame and the third video frame extracted from the video stream, and the third video frame can be selected as the target image, and the first face area of the third video frame is also the target face area of the target image.

Step 104 , according to the target face region, crop the target image to obtain a cropped image.

In the embodiment of the present application, after determining the target image and the target face area corresponding to the target image, the first terminal cuts the target image according to the target face area to obtain a cropped image, and the cropped image includes the target face area. human face.

Step 105: Send the cropped image to a server, so that the server can identify the cropped image to determine the identity information of the target user.

In the embodiment of the present application, after obtaining the cropped image, the first terminal sends the cropped image to the server, and the server receives the cropped image sent by the first terminal, and then the server extracts the first face feature point in the cropped image, and converts the cropped image to the server. The first facial feature point is compared with the facial feature points stored in the server to determine the user ID corresponding to the target user, and then the server obtains the identity information of the target user corresponding to the user ID, thereby realizing the determination of the target user's identity information.

Wherein, the first facial feature point may be at least one feature point in the human face, such as the feature points such as nose, left eye, right eye, mouth, etc.; the user identification is the same as the facial feature point of the target user stored in the server and the target user's facial feature point stored in the server. Identity information association, which is actually an index number, through which the facial feature points stored in the server are associated with the identity information stored in the server; the identity information of the target user includes the target user's name, age, gender, ID number, mobile phone number, occupation, education and other information.

When the camera collects the video stream containing the face information of the target user, it can directly determine the identity information of the target user based on the cooperative use of the first terminal and the server. The marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.

In addition, the cropped image is obtained by cropping the target image according to the target face area, and the cropped image is sent to the server to identify the identity information of the target user, instead of directly sending the target image to the server to identify the identity information, The purpose is to reduce the amount of computing on the server and reduce the computing pressure on the server. In addition, the extraction process of the target image and the cropping process of the cropped image are deployed on the first terminal, and the identification and comparison process of the first facial feature points and the query process of identity information are deployed on the server, so as to avoid integrating all functions in one device. The bandwidth pressure caused by video streaming is large. By deploying some functions on the first terminal and deploying some functions on the server, the bandwidth pressure during video streaming can be effectively relieved, that is, the network bandwidth pressure can be reduced.

In the embodiment of the present application, by extracting at least one frame of the first image to be recognized from the video stream, and selecting the target image from it, the first face area corresponding to the target image is determined as the target face area, and then according to the target face area The target image is cropped in the area, and the cropped image is sent to the server for face recognition to determine the identity information of the target user. When the collected video stream contains the face of the target user, the identity information of the target user can be known in time. The personnel can customize the marketing plan in time according to the user's identity information, improve the marketing efficiency, and improve the user's experience effect.

Referring to FIG. 2 , a specific flowchart of an identity recognition method according to an embodiment of the present application is shown, which is applied to the first terminal and may specifically include the following steps:

Step 201 , extract at least two frames of the first image to be identified from the video stream at every preset number of frames.

In the embodiment of the present application, the camera collects the video stream in real time, and sends the collected video stream to the first terminal, and the first terminal can extract one frame of image from the video stream every preset number of frames to obtain multiple video frames, Then, at least two video frames are selected from the plurality of video frames as the first to-be-identified images, that is, at least two frames of the first to-be-identified images are extracted from the video stream.

Specifically, in the at least two frames of the first to-be-recognized images, any two adjacent frames of the first to-be-recognized images may be consecutive video frames, and any two adjacent frames of the first to-be-recognized images are actually in the video stream. The upper interval preset number of frames.

For example, extract 3 frames of the first image to be identified from the video stream, which are the first video frame, the second video frame, and the third video frame in the video stream, and the first video frame is the first video frame in the video stream. The first frame image, the second video frame is the 11th frame image in the video stream, and the third video frame is the 21st frame image in the video stream. Among these 3 frames of the first image to be recognized, the first video frame frame, the second video frame, and the third video frame are consecutive, there are no other extracted video frames in between, and the two adjacent frames of the first to-be-recognized image are actually spaced apart in the video stream by the preset frame The number is 10 frames.

Step 202: Perform compression processing on each frame of the first image to be identified, so that the size of the first image to be identified after compression is smaller than the size of the first image to be identified before compression.

In this embodiment of the present application, after extracting at least two frames of the first to-be-recognized image from the video stream, the first terminal performs compression processing on each frame of the first to-be-recognized image, so that the compressed first to-be-recognized image is The size is smaller than the size of the first to-be-recognized image before compression. Specifically, the first to-be-recognized image may be reduced proportionally.

For example, the width of the first image to be recognized before compression is W and the height is H, then the width of the size of the first image to be recognized after compression is W/3, and the height is also H/3.

Step 203: Input the compressed first image to be recognized in each frame into the SSD model to obtain a first face region where the face in the first image to be recognized is located in each frame.

In the embodiment of the present application, a pre-trained target detection model is stored in the first terminal, the target detection model is a neural network model, and the neural network model can be selected from SSD (Single Shot MultiBox Detector, click multi-box detection )Model.

After compressing the first image to be recognized in each frame, the first terminal can input the compressed first image to be recognized in each frame into the SSD model, and the SSD model will output the first image to be recognized in each frame. The location information corresponding to the first face area where the face of the The category corresponding to the face area.

By compressing the first to-be-recognized image and sending the compressed first to-be-recognized image to the SSD model for detection, the detection speed can be shortened.

Among them, the SSD model includes a backbone network, a multi-scale detection sub-network and an NMS (NonMaximum Suppression, non-maximum suppression) network that are connected in sequence. The backbone network can be a deep convolutional neural network, such as a VGG16 network. Layers are convolutional layers connected in sequence and a classification network layer and a position regression network layer connected to each convolutional layer.

Input the compressed first image to be recognized into the SSD model, first perform feature extraction on the first image to be recognized through the backbone network to obtain a feature image, and then input the feature image into a multi-layer convolution layer to obtain different scales and different The category probability value (score) and its position offset (location) corresponding to the preselected box of the aspect ratio; then, enter the category probability value, position offset and preselected box of all preselected boxes into the classification network layer and location The classification and regression processing are carried out in the regression network layer; finally, each pre-selected box after classification and regression processing is input into the NMS network, the redundant pre-selected boxes are eliminated through the NMS network, and the pre-selected box with the highest confidence is selected as the first one. The rectangular box corresponding to the face area.

Step 204, for the at least two frames of the first to-be-recognized images, respectively calculate the relationship between the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame of the first to-be-recognized image. The intersection ratio of the first face region.

In the embodiment of the present application, for at least two frames of the first to-be-recognized image, the first face region corresponding to the first to-be-recognized image of each frame and the first to-be-recognized image corresponding to the adjacent previous frame are calculated respectively. Intersection Over Union (IOU) of the face area. The intersection ratio refers to the ratio of the area of the overlapping area to the total area of the two first face areas in the two first face areas.

For example, extract 3 frames of the first image to be identified from the video stream, which are the 1st frame image in the video stream, the 11th frame image in the video stream, and the 21st frame image in the video stream, and the 3 frames of the first image When the first face area corresponding to the image to be recognized is the first face area 1, the first face area 2 and the first face area 3, respectively, the first face area 1 and the first face area 2 are calculated respectively. The intersection ratio of , and the intersection ratio of the first face area 2 and the first face area 3.

Step 205, when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to a first set threshold, select any frame of the first to-be-recognized image from the at least two frames of the first to-be-recognized images. The image to be recognized is used as the target image, and the first face area corresponding to the target image is determined as the target face area.

In the embodiment of the present application, when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to the first set threshold, it is determined that the first terminal has not mistakenly recognized the face in the first to-be-recognized image The first face region where the detected face is located, that is, the first face region where the detected face is located is accurate. At this time, any frame of the first to-be-recognized image is selected from at least two frames of the first to-be-recognized image as the target image, and The first face region corresponding to the target image is determined as the target face region. Usually, the last frame of the first to-be-recognized image is selected from at least two frames of the first to-be-recognized image as the target image; the first set threshold can be set manually, for example, the first set threshold is 0.5.

For example, it is calculated that the intersection ratio between the first face area 1 and the first face area 2 is 0.6, and the intersection ratio between the first face area 2 and the first face area 3 is 0.8, which are all larger than the first set The threshold value is 0.5, and the first three frames of the first image to be recognized extracted from the video stream are respectively: the first frame image in the video stream, the 11th frame image in the video stream and the 21st frame image in the video stream, then select from The 21st frame image in the video stream is used as the target image, and the first face area 3 corresponding to the 21st frame image in the video stream is used as the target face area.

It should be noted that, when there is an intersection ratio smaller than the first set threshold in the intersection ratios corresponding to the at least two frames of the first to-be-recognized images, it is determined that the first face region where the detected face is located is inaccurate. , that is, the first terminal may mistake other objects as faces, therefore, it will not continue to process the wrong first face region, and re-execute the steps of extracting the first to-be-recognized image from the video stream and the subsequent steps.

Step 206: Detect the coordinate position corresponding to the first face key point in the target face area.

In the embodiment of the present application, after determining the target image and the target face area corresponding to the target image, the first terminal detects the first face key points in the target face area through a face detection algorithm, and determines the first face The coordinate position corresponding to the key point.

Among them, the first face key points include key points such as left eye, right eye, nose, left mouth corner, right mouth corner, etc., and the coordinate position corresponding to the first face key point includes the coordinate position of the left eye in the target image, the right eye The coordinate position in the target image, the coordinate position of the nose in the target image, the coordinate position of the left mouth corner in the target image, and the coordinate position of the right mouth corner in the target image, etc.

By pre-determining the target face region in the target image, and then detecting only the first face key point in the target face region, the calculation amount of the first terminal can be reduced.

Step 207 , according to the coordinate position corresponding to the first face key point, intercept an area including the first face key point from the target image to obtain the cropped image.

In the embodiment of the present application, after detecting the coordinate position corresponding to the first face key point in the target face area, the first terminal intercepts the target image including the first face key point according to the coordinate position corresponding to the first face key point. The area of key points of a face is obtained as a cropped image.

Specifically, the target image is cropped according to the preset cropping size, so that the size of the cropped image is the preset cropping size, and the cropped image includes all the key points of the first face.

Step 208: Send the cropped image to a server, so that the server can identify the cropped image to determine the identity information of the target user.

The principle of this step is similar to that of step 105 in the above-mentioned first embodiment, and details are not repeated here.

Further, after step 208, it also includes the following steps: receiving the user identification of the target user sent by the server, and storing the user identification and the target face area corresponding to the user identification; Extracting N frames of the second image to be identified in sequence; the second image to be identified is a video frame located after the first image to be identified in the video stream, and N is a positive integer greater than 1; based on the target A detection model, which detects the second face area where the face in the second image to be recognized is located in each frame; calculates the target face area and the second face corresponding to the second image to be recognized in each frame The intersection ratio of the area; when the second face area corresponding to the second to-be-recognized image of N frames continuously extracted, there is a second face area whose intersection ratio with the target face area is greater than or equal to the second set threshold. When there are two face regions, the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and subsequent steps are performed; when the consecutively extracted N frames of the second to-be-recognized image correspond to the second face region, When the intersection ratio with the target face area is less than the second set threshold, delete the user ID and the target face area, and re-execute the first step of extracting at least one frame from the video stream. Image to be recognized and subsequent steps.

In the embodiment of the present application, after the first terminal sends the cropped image to the server, the server will extract the first facial feature point in the cropped image, and compare the first facial feature point with the facial feature point stored in the server Make a comparison to determine the user ID corresponding to the target user, and then, the server will send the user ID of the target user to the first terminal; the first terminal receives the user ID of the target user sent by the server, and sends the user ID to the target user The face area is cached.

Since the first terminal will acquire the video stream captured by the camera in real time, and split the video stream into multiple video frames as required, the video frame located after the first image to be recognized in the video stream is called the second to-be-recognized image. image, which can be implemented to sequentially extract N frames of the second image to be recognized from the video stream.

For example, the first image to be recognized includes a first video frame, a second video frame, and a third video frame, and the first video frame is the first frame image in the video stream, and the second video frame is the video stream The 11th frame image in the video stream, the third video frame is the 21st frame image in the video stream, therefore, the video frame after the third video frame is called the second image to be recognized, such as the fourth video frame, the first video frame The five video frames and the sixth video frame are the second images to be identified, the fourth video frame is the 31st frame image in the video stream, the fifth video frame is the 41st frame image in the video stream, and the fourth video frame is the 31st frame image in the video stream. Six video frames are the 51st frame image in the video stream.

After extracting the second to-be-recognized image, the first terminal inputs each frame of the second to-be-recognized image into the target detection model to obtain a second face region where the face in each frame of the second to-be-recognized image is located. The target detection model may also be an SSD model, and the detection process of the second face region in the second to-be-recognized image is similar to the detection process of the first human-face region in the first to-be-recognized image, which will not be repeated here.

Then, the first terminal calculates the intersection ratio between the cached target face region and the second face region corresponding to the second to-be-recognized image of each frame.

When there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the N frames of the second to-be-recognized images continuously extracted, the target face region is determined. The target user corresponding to the area has not missed tracking. At this time, continue to extract N frames of the second to-be-recognized image from the video stream and subsequent steps, that is, continue to extract the second to-be-recognized image from the video stream, and detect In the second face region where the face in the second to-be-recognized image is located, the intersection ratio between the target face region and the second face region is calculated, and the judgment is made again with the second set threshold.

When the intersection ratio of the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted and the target face region is smaller than the second set threshold, it is determined that the target user corresponding to the target face region has missed tracking , at this time, delete the cached user ID and the target face area, and re-execute step S201 and subsequent steps, that is, re-extract the first image to be recognized from the video stream, and detect where the face in the first image to be recognized is located. The first face area is selected from the target image and the target face area, and then the target image is cropped, and the cropped image is sent to the server to identify the identity information of the target user.

Among them, the second set threshold and the first set threshold may be equal or unequal; the number of frames N of the second to-be-recognized images continuously extracted, and the specific value of N can be manually set according to the actual situation, such as setting N for 20 frames.

In this embodiment of the present application, after determining the user identifier of the target user, the server sends the user identifier to the first terminal, and the first terminal stores the user identifier and the target face area, and based on the target face area, analyzes subsequent slave video streams from the video stream. The extracted second to-be-recognized image is judged to realize the tracking of the target user. When the target user does not miss tracking, the first terminal will not continue to crop the second to-be-recognized image, nor will the cropped image be sent to the server. In order to identify the identity information of the target user again, it is avoided that the terminal sends all the cropped images to the server, resulting in a substantial increase in the computing pressure of the server. The speed of identifying the identity information of each target user is correspondingly improved; and when the target user is missing tracking, the identity information of the target user is re-detected through the first terminal and the server.

In the embodiment of the present application, at least two frames of the first to-be-recognized image are extracted from the video stream, and the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame of the first to-be-recognized image are extracted. When the intersection ratio of the first face region corresponding to the image is greater than or equal to the first set threshold, select the target image from at least two frames of the first to-be-recognized image, and assign the first face region corresponding to the target image. Determine the target face area, then identify the coordinate position corresponding to the first face key point in the target face area, crop the target image according to the coordinate position corresponding to the first face key point, and send the cropped image to the server for processing. Face recognition, determine the identity information of the target user, when the collected video stream contains the face of the target user, the identity information of the target user can be known in time, and the staff can customize the marketing plan in time according to the identity information of the user , improve the efficiency of marketing, and improve the user experience effect; and only when the intersection ratio corresponding to at least two frames of the first to-be-recognized image is greater than or equal to the first set threshold, the target image is selected from it, avoiding the target detection model. The area where other objects are located is mistakenly determined as the first face area, resulting in the problem that the identity information of the target user cannot be detected in the future.

Referring to FIG. 3 , a flowchart of another identity recognition method according to an embodiment of the present application is shown, which is applied to a server and may specifically include the following steps:

Step 301: Receive a cropped image sent by a first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region.

In the embodiment of the present application, the first terminal acquires the video stream collected by the camera in real time, extracts at least one frame of the first image to be recognized from the video stream, and then detects the person in each frame of the first image to be recognized based on the target detection model In the first face area where the face is located, the target image is selected from the first images to be recognized in each frame, and the first face area corresponding to the target image is determined as the target face area, and then the target image is processed according to the target face area. Crop, get the cropped image, and finally, send the cropped image to the server.

Correspondingly, the server receives the cropped image sent by the first terminal, and the cropped image is obtained by the first terminal after cropping the target image according to the target face area. Specifically, the cropped image includes the target face area in the target image. The key point of the first face.

Step 302, extracting the first face feature point in the cropped image.

In this embodiment of the present application, after receiving the cropped image sent by the first terminal, the server extracts a first facial feature point in the cropped image, and the first facial feature point may be at least one feature point in a human face, such as a nose , left eye, right eye, mouth and other feature points.

Step 303: Compare the first face feature point with the face feature point stored in the server to determine the user identifier corresponding to the target user.

In the embodiment of the present application, a face feature database and a user identity information database are set in the server, the face feature database stores the face feature points of each user, the user identity information database stores the identity information of each user, and the server stores the face feature points of each user. User IDs of each user are stored, and the user IDs are in one-to-one correspondence with the user's facial feature points stored in the facial feature database and the user's identity information stored in the user identity information database.

After extracting the first face feature point in the cropped image, the server compares the first face feature point with each face feature point stored in the face feature database. When the similarity of the target face feature points stored in the library is greater than the similarity threshold, it is determined that the first face feature point matches the target face feature point, and then the user ID corresponding to the target face feature point is queried. The ID is the user ID of the target user corresponding to the cropped image.

Step 304: Obtain the identity information of the target user corresponding to the user identifier.

In the embodiment of the present application, after the server obtains the user identifier corresponding to the target user, because the user identifier is the same as the user's face feature points stored in the face feature database and the user's identity information stored in the user identity information database One-to-one correspondence, therefore, the identity information of the target user can be queried from the user identity information database according to the user identity of the target user.

The identity information of the target user includes the target user's name, age, gender, ID card number, mobile phone number, occupation, education background and other information.

In addition, the server may count the visiting time and the number of visits of the target user based on the identification time and identification times of the identity information of each target user.

Further, after step 304, the method further includes: sending the identity information of the target user to a second terminal, so as to remind the target user of visiting through the second terminal.

In this embodiment of the present application, after identifying the identity information of the target user, the server sends the identity information of the target user to the second terminal, and the second terminal receives the identity information of the target user sent by the server, and passes the information on the display screen of the second terminal. The identity information is displayed to remind the relevant staff of the target user's visit. The staff can view the identity information of the target user through the second terminal, and formulate a marketing plan for the target user in time to improve the efficiency of marketing.

The second terminal may be a display screen deployed in a corresponding occasion, such as a display screen deployed in a bank business hall, and the second terminal may also be a terminal device designated by relevant staff, such as a mobile phone held by a product manager, Terminal equipment such as computers.

Further, after step 303, it also includes: sending the user identification of the target user to the first terminal, so as to calculate the second person corresponding to the N frames of the second to-be-identified images continuously extracted through the first terminal The intersection ratio between the face area and the target face area, so as to realize the tracking of the target user; the N is a positive integer greater than 1.

In the embodiment of the present application, after determining the user ID corresponding to the target user, the server sends the user ID of the target user to the first terminal, and the first terminal receives the user ID of the target user sent by the server, and sends the user ID and the user ID to the first terminal. The corresponding target face area is cached.

Next, the first terminal sequentially extracts N frames of the second to-be-recognized image from the video stream, where the second to-be-recognized image is a video frame located after the first to-be-recognized image in the video stream, and then, based on the target detection model, detects each The second face area where the face in the second frame of the image to be identified is located, and the intersection ratio between the target face area and the second face area corresponding to each frame of the second image to be identified is calculated.

According to the intersection ratio of the second face region corresponding to the N frames of the second to-be-recognized image and the target face region, it is determined whether the target user has missed tracking. Specifically, when there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted, determine The target user corresponding to the target face area has no missed tracking; when the intersection ratio of the second face area corresponding to the N frames of the second to-be-recognized image continuously extracted and the target face area is smaller than the second set threshold, It is determined that the target user corresponding to the target face area has missed tracking.

In the case that the target user does not miss tracking, continue to perform the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and the subsequent steps, and in the case of missing tracking of the target user, delete the cached user ID and target face area, and re-execute the steps of extracting the first image to be recognized from the video stream and the subsequent steps.

Tracking the target user through the first terminal can reduce the calculation pressure of the terminal and the server, and correspondingly improve the identification speed of the identity information of each target user.

Referring to FIG. 4, a flowchart of the registration process of the facial feature points and identity information of the registered user in the embodiment of the present application is shown, which may specifically include the following steps:

Step 401: Receive the face image and the identity information of the registered user sent by the third terminal.

In the embodiment of this application, the registered user can input the face image and the identity information of the registered user on the third terminal, and then the third terminal sends the face image and the identity information of the registered user to the server, and the server receives the transmission from the third terminal. face images and the identity information of registered users.

The face image sent by the third terminal may be collected in real time by the third terminal, or may be pre-stored on the third terminal; and, the third terminal may be a terminal device deployed in a corresponding occasion, such as in a bank business hall The deployed terminal that can collect face images, and the third terminal may also be a terminal device held by a registered user, such as a mobile phone held by a registered user.

Step 402 , based on the target detection model, detect a third face region where the face in the face image is located.

In the embodiment of the present application, a pre-trained target detection model is also stored in the server, and the target detection model may be an SSD model.

After receiving the face image and the identity information of the registered user sent by the third terminal, the server inputs the face image into the target detection model, and the target detection model outputs the third face area where the face in the face image is located.

The third face area is actually a rectangular frame area, and the detected third face area is also represented by the position information of the third face area.

Step 403: Extract the second face feature points in the third face region.

In this embodiment of the present application, after detecting the third face region where the face in the face image is located, the server extracts a second face feature point in the third face region, and the second face feature point may be a human face At least one feature point in the image, such as nose, left eye, right eye, mouth and other feature points.

Step 404: Compare the second face feature point with the face feature point stored in the server to determine whether the second face feature point is stored in the server.

In the embodiment of the present application, after extracting the second face feature point in the third face area, the server compares the second face feature point with the face feature point stored in the server, that is, determines the second face feature point Whether the feature point matches any face feature point stored in the server, so as to determine whether there is a second face feature point stored in the server, that is, to determine whether the registered user has already registered.

Step 405, when the second face feature point is stored in the server, return registration error information to the third terminal.

In the embodiment of the present application, when the second facial feature point matches one of the facial feature points stored in the server, it is determined that the second facial feature point is stored in the server. Correspondingly, it is also determined that the registered user has previously After registration, at this time, the server returns a registration error message to the third terminal, reminding the registered user that the registration has been done before.

In the actual use process, there may be two users who look alike. When it is determined that the second facial feature point is stored in the server, the registration error information, the identity information of the registered user and the server can also be sent to the terminal device corresponding to the staff member. The identity information corresponding to the facial feature points that match the second facial feature points in the data center is used to remind the staff to deal with it in time, such as manually registering the registered user.

Step 406, when the second face feature point is not stored in the server, save the second face feature point and the identity information of the registered user, and generate a point the user ID associated with the registered user's identity information.

In the embodiment of the present application, when the second facial feature point does not match each facial feature point stored in the server, it is determined that there is no second facial feature point stored in the server, and then the server saves the second facial feature point and the identity information of the registered user, specifically, storing the second facial feature points in the facial feature database, storing the identity information of the registered user in the user identity information database, and generating the second facial feature The user ID associated with the registered user's identity information.

It should be noted that the first terminal, the second terminal and the third terminal are not the same terminal, and the first terminal is actually a development board fixed on the back end of the camera, such as the rk3399 development board.

Further, after step 402, it also includes: based on the face occlusion model, determining whether the face in the third face area is occluded; when the face in the third face area is not occluded, Detecting the coordinate position corresponding to the second face key point in the third face area; determining whether the face in the third face area is a frontal face according to the coordinate position corresponding to the second face key point ; When the human face in the third human face area is a frontal face, perform the steps of extracting the second human face feature point in the third human face area and subsequent steps.

In the embodiment of the present application, the server prestores a trained face occlusion model, and the face occlusion model is a neural network model. After detecting the third face area where the face in the face image is located, the server inputs the face image to the face occlusion model, and the face occlusion model outputs a corresponding result, which represents the third person in the face image Whether the face in the face area is occluded.

When the face in the third face area is not blocked, the server detects the second face key point in the third face area, and determines the coordinate position corresponding to the second face key point, and the second face key point includes If the key points such as left eye, right eye, nose, left corner of mouth, right corner of mouth, etc., the coordinate position corresponding to the second face key point includes the coordinate position of the left eye in the face image and the coordinate position of the right eye in the face image. , the coordinate position of the nose in the face image, the coordinate position of the left mouth corner in the face image, and the coordinate position of the right mouth corner in the face image, etc.

Then, the server determines whether the face in the third face area is a frontal face according to the coordinate position corresponding to the key point of the second face. Specifically, according to the distance between the coordinate positions of any two second face key points, it is judged whether the face in the third face area is a frontal face, when the coordinate position of any two second face key points is between When the distances between the two are all within their corresponding preset distance ranges, it is determined that the face in the third face area is a frontal face. If there is a distance between the coordinate positions of two second face key points, the When the distance range is preset, it is determined that the face in the third face area is not a frontal face.

For example, the distance between the left eye and the right eye is L1, and the corresponding preset distance range between the left eye and the right eye is [L2, L3], the distance L1 between the left eye and the right eye is not located at the preset distance Within the range [L2, L3], it is determined that the face in the third face area is not a frontal face.

When the server determines that the face in the third face area is a frontal face, it performs the steps of extracting the second face feature point in the third face area and the subsequent steps, ie, steps 403 to 406 are performed.

However, when the face in the third face area is blocked, or the face in the third face area is not a frontal face, it means that the face image sent by the third terminal does not meet the requirements, the whole execution process ends, and the server Registration will not be performed based on the face image sent by the third terminal and the identity information of the registered user.

In the embodiment of the present application, by extracting the first face feature point in the cropped image, comparing the first face feature point with the face feature point stored in the server, determining the user ID corresponding to the target user, and then obtaining the user ID Corresponding to the identity information of the target user, when the collected video stream contains the face of the target user, the cropped image also contains the face of the target user. Based on the cropped image, the identity information of the target user can be known in time. The marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.

Referring to FIG. 5 , a structural block diagram of a terminal according to an embodiment of the present application is shown.

The terminal 500 provided in this embodiment of the present application is a first terminal, and the terminal 500 includes:

The first to-be-recognized image extraction module 501 is configured to extract at least one frame of the first to-be-recognized image from the video stream;

The first face region detection module 502 is configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model;

The target face area determination module 503 is configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;

The target image cropping module 504 is configured to crop the target image according to the target face region to obtain a cropped image;

The cropped image sending module 505 is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.

Optionally, the first to-be-recognized image extraction module 501 includes:

The first to-be-recognized image extraction submodule is configured to extract at least two frames of the first to-be-recognized image from the video stream at a preset number of frames per interval;

The target face area determination module 503 includes:

The intersection ratio calculation sub-module is configured to, for the at least two frames of the first to-be-recognized image, respectively calculate the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame The intersection ratio of the first face region corresponding to the first to-be-recognized image;

The target face area determination sub-module is configured to, when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to a first set threshold, extract the data from the at least two frames of the first to-be-recognized images Select any frame of the first image to be recognized as the target image, and determine the first face region corresponding to the target image as the target face region.

Optionally, the target image cropping module 504 includes:

a coordinate position detection submodule, configured to detect the coordinate position corresponding to the first face key point in the target face area;

The target image cropping sub-module is configured to intercept an area including the first facial key point from the target image according to the coordinate position corresponding to the first facial key point to obtain the cropped image.

Optionally, the terminal 500 further includes:

a user identification receiving module, configured to receive the user identification of the target user sent by the server, and store the user identification and the target face area corresponding to the user identification;

The second to-be-recognized image extraction module is configured to sequentially extract N frames of the second to-be-recognized image from the video stream; the second to-be-recognized image is located after the first to-be-recognized image in the video stream Video frame, the N is a positive integer greater than 1;

The second face region detection module is configured to detect, based on the target detection model, the second face region where the face in the second to-be-recognized image of each frame is located;

an intersection ratio calculation module, configured to calculate the intersection ratio between the target face region and the second face region corresponding to the second to-be-recognized image of each frame;

When there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the second to-be-recognized image of N frames continuously extracted, Execute the second to-be-recognized image extraction module;

The user identification deletion module is configured to when the second face region corresponding to the second to-be-recognized image of N frames continuously extracted and the intersection ratio of the target face region and the target face region are all smaller than the second set threshold , delete the user ID and the target face region, and execute the first image extraction module 501 to be recognized again.

Optionally, the first face region detection module 502 includes:

The first to-be-recognized image compression sub-module is configured to perform compression processing on each frame of the first to-be-recognized image, so that the size of the compressed first to-be-recognized image is smaller than the size of the first to-be-recognized image before compression the size of the image;

The first face area detection sub-module is configured to input the compressed first image to be recognized in each frame into the SSD model, and obtain the first image where the face in each frame of the first image to be recognized is located. face area.

Referring to FIG. 6 , a structural block diagram of a server according to an embodiment of the present application is shown.

The server 600 provided by the embodiment of the present application includes:

The cropped image receiving module 601 is configured to receive the cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face area;

The first face feature point extraction module 602 is configured to extract the first face feature point in the cropped image;

The user identification determination module 603 is configured to compare the first facial feature point with the facial feature point stored in the server, and determine the user identification corresponding to the target user;

The identity information obtaining module 604 is configured to obtain the identity information of the target user corresponding to the user identification.

Optionally, the server 600 further includes:

The identity information sending module is configured to send the identity information of the target user to a second terminal, so as to remind the target user of visiting through the second terminal.

Optionally, the server 600 further includes:

The user identification sending module is configured to send the user identification of the target user to the first terminal, so as to calculate the second face area corresponding to the N frames of the second to-be-recognized images continuously extracted by the first terminal and The intersection ratio of the target face area, so as to realize the tracking of the target user; the N is a positive integer greater than 1.

Optionally, the server 600 further includes:

a face image receiving module, configured to receive the face image and the identity information of the registered user sent by the third terminal;

The third face area detection module is configured to detect the third face area where the face in the face image is located based on the target detection model;

The second face feature point extraction module is configured to extract the second face feature point in the third face region;

a face feature point comparison module, configured to compare the second face feature point with the face feature point stored in the server, and determine whether the second face feature point is stored in the server;

A registration error information return module is configured to return registration error information to the third terminal when the second face feature point is stored in the server;

The second face feature point storage module is configured to save the second face feature point and the identity information of the registered user when the second face feature point is not stored in the server, and generate User identifiers respectively associated with the second facial feature points and the identity information of the registered user.

Optionally, the server 600 further includes:

an occlusion judgment module, configured to determine whether the face in the third face region is occluded based on the face occlusion model;

a coordinate position detection module, configured to detect the coordinate position corresponding to the key point of the second face in the third face area when the face in the third face area is not blocked;

a front face judgment module, configured to determine whether the face in the third face area is a front face according to the coordinate position corresponding to the second face key point;

When the face in the third face region is a frontal face, the second face feature point extraction module is executed.

Correspondingly, an embodiment of the present application further provides a terminal, including a processor, a memory, and a computer program stored on the memory and running on the processor, when the computer program is executed by the processor Steps for implementing the above-mentioned terminal-side identification method.

Embodiments of the present application further provide a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned terminal-side identification method are implemented.

Correspondingly, an embodiment of the present application further provides a server, including a processor, a memory, and a computer program stored on the memory and running on the processor, when the computer program is executed by the processor The steps of implementing the above-mentioned server-side identification method.

Embodiments of the present application further provide a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the foregoing server-side identification method are implemented.

Referring to FIG. 7 , a structural diagram of an identity recognition system according to an embodiment of the present application is shown.

An embodiment of the present application further provides an identity recognition system, including a camera 701 , a second terminal 702 , a third terminal 703 , the above-mentioned first terminal 500 and the above-mentioned server 600 .

Wherein, the camera 701 is configured to capture the video stream and send the video stream to the first terminal 500; the second terminal 702 is configured to receive the identity information of the target user sent by the server 600, so as to remind the target user of the visit; The third terminal 703 is configured to send the face image and the identity information of the registered user to the server 600, and receive the registration error information returned by the server 600 when the second face feature point included in the face image is stored in the server 600.

In the embodiment of the present application, the camera 701 sends the video stream to the first terminal 500, the first terminal extracts the video frame from the video stream, and selects the first image to be recognized and the second image to be recognized from the video frame. The video frame located after the first to-be-recognized image is the second to-be-recognized image.

The first terminal 500 inputs the first to-be-recognized image into the target detection model, obtains the first face region where the face in each frame of the first to-be-recognized image is located, selects the target image from each frame of the first to-be-recognized image, and Determine the first face area corresponding to the target image as the target face area; then, detect the first face key point in the target face area, and determine the coordinate position of the first face key point, according to the first face The coordinate position corresponding to the key point is intercepted from the target image including the area of the first face key point to obtain a cropped image.

The first terminal 500 sends the cropped image to the server 600, and the server 600 extracts the first facial feature point in the cropped image, and compares the first facial feature point with the facial feature point stored in the facial feature database to determine The user identification corresponding to the target user; the server 600 queries the identification information of the target user from the user identification information database according to the user identification, and sends the identification information of the target user to the second terminal 702, so as to carry out the identification of the target user through the second terminal 702. Visit reminder.

In addition, the server 600 will also send the user identification corresponding to the target user to the first terminal 500, the first terminal 500 will cache the user identification and the target face area corresponding to the user identification, and the first terminal 500 will also store the second pending identification. The image is input to the target detection model to obtain the second face region where the face in the second to-be-recognized image is located, and then the first terminal 500 calculates the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted and the target The intersection ratio of the face area, so as to realize the tracking of the target user.

In addition, the registered user can also send the face image and the identity information of the registered user to the server 600 through the third terminal 703 to register the user information. When the registration is successful, the server 600 sends the second face feature point in the face image to the server 600 Store in the face feature database, and store the registered user's identity information in the user identity information database.

In actual use, the camera can be deployed in a bank business hall or other occasions where identification of the target user's identity information is required.

In the embodiment of the present application, by extracting at least one frame of the first image to be recognized from the video stream, and selecting the target image from it, the first face area corresponding to the target image is determined as the target face area, and then according to the target face area The target image is cropped in the region, and the cropped image is sent to the server. The server extracts the first facial feature point in the cropped image, compares the first facial feature point with the facial feature points stored in the server, and determines the corresponding target user. and then obtain the identity information of the target user corresponding to the user ID. When the collected video stream contains the face of the target user, the cropped image also contains the face of the target user accordingly, and the target can be known in time based on the cropped image. With the user's identity information, the staff can customize the marketing plan in time according to the user's identity information, improve the efficiency of marketing, and improve the user's experience effect.

As for the apparatus embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for related parts.

The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the computing processing device according to the embodiments of the present application. The present application can also be implemented as an apparatus or apparatus program (eg, computer programs and computer program products) for performing part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.

For example, FIG. 8 shows a computing processing device, such as the aforementioned server 600 or terminal 500, that can implement the method according to the present application. The computing processing device traditionally includes a processor 810 and a computer program product or computer readable medium in the form of a memory 820 . The memory 820 may be electronic memory such as flash memory, EEPROM (electrically erasable programmable read only memory), EPROM, hard disk, or ROM. The memory 820 has storage space 830 for program code 831 for performing any of the method steps in the above-described methods. For example, storage space 830 for program code may include various program codes 831 for implementing various steps in the above methods, respectively. These program codes can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as described with reference to FIG. 9 . The storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 820 in the computing processing device of FIG. 8 . The program code may, for example, be compressed in a suitable form. Typically, the storage unit includes computer readable code 831', ie code readable by a processor such as 810 for example, which when executed by a computing processing device, causes the computing processing device to perform any of the methods described above. of the various steps.

Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the present application. Also, please note that instances of the phrase "in one embodiment" herein are not necessarily all referring to the same embodiment.

In the description provided herein, numerous specific details are set forth. It will be understood, however, that the embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions recorded in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

An identity recognition method, characterized in that, applied to a first terminal, the method comprising:

extracting at least one frame of the first image to be identified from the video stream;

Based on the target detection model, detect the first face region where the face in the first to-be-recognized image of each frame is located;

Select a target image from the first to-be-recognized images of each frame, and determine the first face region corresponding to the target image as the target face region;

According to the target face region, the target image is cropped to obtain a cropped image;

Sending the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
The method according to claim 1, wherein the step of extracting at least one frame of the first image to be recognized from the video stream comprises:

extracting at least two frames of the first to-be-identified image from the video stream every preset number of frames;

The step of selecting a target image from the first to-be-recognized images of each frame, and determining the first face region corresponding to the target image as the target face region, includes:

For the at least two frames of the first to-be-recognized images, calculate the first face region corresponding to the first to-be-recognized image of each frame and the first person corresponding to the adjacent previous frame of the first to-be-recognized image respectively. The intersection ratio of face area;

When the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to the first set threshold, select any frame of the first to-be-recognized image from the at least two frames of the first to-be-recognized images As the target image, the first face area corresponding to the target image is determined as the target face area.
The method according to claim 1, wherein the step of cropping the target image according to the target face region to obtain the cropped image comprises:

Detecting the coordinate position corresponding to the first face key point in the target face area;

According to the coordinate position corresponding to the first face key point, an area including the first face key point is intercepted from the target image to obtain the cropped image.
The method according to claim 1, wherein after the step of sending the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user, further include:

Receive the user identification of the target user sent by the server, and store the user identification and the target face area corresponding to the user identification;

Extract N frames of second to-be-recognized images in sequence from the video stream; the second to-be-recognized image is a video frame located after the first to-be-recognized image in the video stream, and N is a positive value greater than 1 integer;

Based on the target detection model, detect the second face region where the face in the second to-be-recognized image of each frame is located;

Calculate the intersection ratio of the target face region and the second face region corresponding to the second to-be-recognized image of each frame;

When there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the second to-be-recognized image of N frames continuously extracted, performing the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and subsequent steps;

When the intersection ratio of the second face region corresponding to the second to-be-recognized image of N frames continuously extracted and the target face region is smaller than the second set threshold, delete the user identification and all the target face region, and re-execute the steps of extracting at least one frame of the first image to be recognized from the video stream and the subsequent steps.
The method according to claim 1, wherein the step of detecting the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model comprises:

compressing each frame of the first image to be identified, so that the size of the first image to be identified after compression is smaller than the size of the first image to be identified before compression;

Inputting the compressed first image to be recognized in each frame into the SSD model to obtain a first face region where the face in the first image to be recognized is located in each frame.
An identity recognition method, characterized in that, applied to a server, the method comprising:

receiving the cropped image sent by the first terminal; the cropped image is obtained by the first terminal cropping the target image according to the target face region;

extracting the first face feature point in the cropped image;

Comparing the first facial feature point with the facial feature point stored in the server to determine the user ID corresponding to the target user;

Acquire the identity information of the target user corresponding to the user identifier.
The method according to claim 6, wherein after the step of acquiring the identity information of the target user corresponding to the user identifier, the method further comprises:

The identity information of the target user is sent to the second terminal, so as to remind the target user of visiting through the second terminal.
The method according to claim 6, wherein, after the step of comparing the first facial feature point with the facial feature points stored in the server to determine the user identifier corresponding to the target user, Also includes:

The user identification of the target user is sent to the first terminal, so as to calculate the intersection of the second face area corresponding to the N frames of the second to-be-recognized images continuously extracted and the target face area through the first terminal. And compare, so as to realize the tracking of the target user; the N is a positive integer greater than 1.
The method according to claim 6, wherein, before the step of comparing the first facial feature point with the facial feature points stored in the server to determine the user identifier corresponding to the target user, Also includes:

Receive the face image and the identity information of the registered user sent by the third terminal;

Based on the target detection model, detect the third face region where the face in the face image is located;

extracting the second face feature points in the third face region;

Comparing the second face feature point with the face feature point stored in the server to determine whether the second face feature point is stored in the server;

When the second face feature point is stored in the server, returning registration error information to the third terminal;

When the second face feature point is not stored in the server, save the second face feature point and the identity information of the registered user, and generate a The user ID associated with the identity information of the registered user.
The method according to claim 9, characterized in that, after the step of detecting the third face region where the face in the face image is located based on the target detection model, further comprising:

Based on the face occlusion model, determine whether the face in the third face area is occluded;

When the face in the third face area is not blocked, detect the coordinate position corresponding to the second face key point in the third face area;

According to the coordinate positions corresponding to the key points of the second face, determine whether the face in the third face area is a frontal face;

When the face in the third face region is a frontal face, the steps of extracting the second face feature point in the third face region and subsequent steps are performed.
A terminal, wherein the terminal is a first terminal, and the first terminal includes:

a first to-be-recognized image extraction module, configured to extract at least one frame of the first to-be-recognized image from the video stream;

a first face region detection module, configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model;

a target face area determination module, configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;

a target image cropping module, configured to crop the target image according to the target face region to obtain a cropped image;

The cropped image sending module is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
A server, characterized in that it includes:

a cropped image receiving module, configured to receive a cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region;

a first face feature point extraction module, configured to extract the first face feature point in the cropped image;

a user identification determining module, configured to compare the first facial feature points with the facial feature points stored in the server, and determine the user identification corresponding to the target user;

The identity information acquisition module is configured to acquire the identity information of the target user corresponding to the user identification.
A terminal, characterized by comprising a processor, a memory, and a computer program stored on the memory and running on the processor, the computer program being executed by the processor to implement the methods described in claims 1 to 1. The steps of any one of the identification methods in 5.
A computer-readable medium, characterized in that a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the identification method according to any one of claims 1 to 5 are implemented .
A server, characterized in that it comprises a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to realize the steps of claims 6 to 6 The steps of any one of the identification methods in 10.
A computer-readable medium, characterized in that a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the identification method according to any one of claims 6 to 10 are realized. .
An identity recognition system, characterized by comprising a camera, a second terminal, a third terminal, the first terminal as claimed in claim 11 and the server as claimed in claim 12;

Wherein, the camera is configured to capture a video stream and send the video stream to the first terminal;

The second terminal is configured to receive the identity information of the target user sent by the server, so as to remind the target user of the visit;

The third terminal is configured to send the face image and the identity information of the registered user to the server, and when the second face feature point included in the face image is stored in the server, receive the The registration error message returned by the server.