WO2022140879A1 - Identity recognition method, terminal, server, and system - Google Patents

Identity recognition method, terminal, server, and system Download PDF

Info

Publication number
WO2022140879A1
WO2022140879A1 PCT/CN2020/139827 CN2020139827W WO2022140879A1 WO 2022140879 A1 WO2022140879 A1 WO 2022140879A1 CN 2020139827 W CN2020139827 W CN 2020139827W WO 2022140879 A1 WO2022140879 A1 WO 2022140879A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
face
target
user
server
Prior art date
Application number
PCT/CN2020/139827
Other languages
French (fr)
Chinese (zh)
Inventor
许景涛
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to CN202080003701.4A priority Critical patent/CN115066712A/en
Priority to PCT/CN2020/139827 priority patent/WO2022140879A1/en
Publication of WO2022140879A1 publication Critical patent/WO2022140879A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • the present application relates to the field of computer technology, and in particular, to an identification method, terminal, server and system.
  • the user needs to extract the corresponding number from the automatic number dispenser, and queue up to the corresponding counter to handle the business in the order of the number.
  • the relevant identity documents will be provided. Only the staff of , can know the identity information of the user. For example, for VIP users of a bank, the identity information of the VIP users can only be determined when the VIP users go to the counter to conduct business.
  • an identification method which is applied to a first terminal, and the method includes:
  • the target image is cropped to obtain a cropped image
  • an identification method applied to a server, the method includes:
  • the cropped image is obtained by the first terminal cropping the target image according to the target face region;
  • a third aspect provides a terminal, where the terminal is a first terminal, and the first terminal includes:
  • a first to-be-recognized image extraction module configured to extract at least one frame of the first to-be-recognized image from the video stream
  • a first face region detection module configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model
  • a target face area determination module configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;
  • a target image cropping module configured to crop the target image according to the target face region to obtain a cropped image
  • the cropped image sending module is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
  • a server including:
  • a cropped image receiving module configured to receive a cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region;
  • a first face feature point extraction module configured to extract the first face feature point in the cropped image
  • a user identification determining module configured to compare the first facial feature points with the facial feature points stored in the server, and determine the user identification corresponding to the target user;
  • the identity information acquisition module is configured to acquire the identity information of the target user corresponding to the user identification.
  • a terminal comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing the above-mentioned identity when executed by the processor Identify the steps of the method.
  • a computer-readable medium where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned identification method are implemented.
  • a server comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing the above-mentioned identity when executed by the processor Identify the steps of the method.
  • a computer-readable medium is provided, and a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned identification method are implemented.
  • an identification system including a camera, a second terminal, a third terminal, the above-mentioned first terminal and the above-mentioned server;
  • the camera is configured to capture a video stream and send the video stream to the first terminal;
  • the second terminal is configured to receive the identity information of the target user sent by the server, so as to remind the target user of the visit;
  • the third terminal is configured to send the face image and the identity information of the registered user to the server, and when the second face feature point included in the face image is stored in the server, receive the The registration error message returned by the server.
  • FIG. 1 schematically shows a flowchart of an identity recognition method according to an embodiment of the present application
  • FIG. 2 schematically shows a specific flowchart of an identity recognition method according to an embodiment of the present application
  • FIG. 3 schematically shows a flowchart of another identity identification method according to an embodiment of the present application
  • FIG. 4 schematically shows a flowchart of a registration process of facial feature points and identity information of registered users in the embodiment of the present application
  • FIG. 5 schematically shows a structural block diagram of a terminal according to an embodiment of the present application
  • FIG. 6 schematically shows a structural block diagram of a server according to an embodiment of the present application.
  • FIG. 7 schematically shows a structural diagram of an identity recognition system according to an embodiment of the present application.
  • Figure 8 schematically shows a block diagram of a computing processing device for performing the method according to the present application.
  • Figure 9 schematically shows a memory unit for holding or carrying program code implementing the method according to the application.
  • FIG. 1 a flowchart of an identity recognition method according to an embodiment of the present application is shown, which is applied to a first terminal and may specifically include the following steps:
  • Step 101 Extract at least one frame of a first image to be recognized from a video stream.
  • the camera collects the video stream in real time, and sends the collected video stream to the first terminal, then the first terminal reads the video stream collected by the camera in real time in a multi-threaded manner, and extracts the video stream from the Extract video frames from the stream.
  • each frame of image in the video stream can be extracted to obtain multiple video frames, or one frame of image can be extracted from the video stream every preset number of frames to obtain multiple video frames, such as every 10 frames.
  • Frame extracts a frame of image from the video stream or extracts a frame of image from the video stream every 5 frames, etc.
  • At least one frame of the first image to be identified is selected from the obtained multiple video frames. Specifically, only one video frame may be selected from the multiple video frames as the first image to be identified. At this time, from the video stream The number of frames of the first to-be-recognized image extracted is one frame; at least two video frames may also be selected from multiple video frames as the first to-be-recognized image, at this time, the first to-be-recognized image extracted from the video stream is The number of frames is at least two frames.
  • the duration of the video stream is 1s
  • the number of frames of images contained in the video stream per second is 60 frames
  • one frame of image is extracted from the video stream every 10 frames
  • 6 frames can be extracted from the video stream.
  • the first video frame is the 1st frame image in the video stream
  • the second video frame is the 11th frame image in the video stream
  • the third video frame is the 21st frame image in the video stream
  • the th The four video frames are the 31st frame image in the video stream
  • the fifth video frame is the 41st frame image in the video stream
  • the sixth video frame is the 51st frame image in the video stream.
  • the first video frame can be selected as the first image to be recognized, and at this time, the number of frames of the first image to be recognized is one frame; alternatively, the first video frame, the second video frame and the third video frame can be selected.
  • the video frame is used as the first image to be recognized, and at this time, the number of frames of the first image to be recognized is 3 frames.
  • Step 102 based on the target detection model, detect the first face region where the face in the first image to be recognized is located in each frame.
  • a pre-trained target detection model is stored in the first terminal, and the target detection model is a neural network model.
  • the target detection model is obtained by training multiple sample images and human-annotated face regions in each sample image.
  • the first terminal After extracting at least one frame of the first to-be-recognized image from the video stream, the first terminal inputs each frame of the first to-be-recognized image into the target detection model, and obtains the first location where the face in each frame of the first to-be-recognized image is located. A person's face area.
  • the first face area is a rectangular frame area
  • the detected first face area is represented by the position information of the first face area, such as the coordinates of the upper left corner of the rectangular frame, the width of the rectangular frame and the The height of the rectangular box represents the first face area.
  • Step 103 Select a target image from the first images to be identified in each frame, and determine the first face region corresponding to the target image as the target face region.
  • the first terminal after detecting the first face region where the face in each frame of the first image to be recognized is located, the first terminal selects a target image from the first image to be recognized in each frame, and assigns the target image to the target image.
  • the first face region corresponding to the image is determined as the target face region.
  • the first image to be recognized is also the target image
  • the first face area of the first image to be recognized is also the target face area of the target image
  • the number of frames of an image to be recognized is at least two frames, according to the relationship between the first face regions, one frame of the first image to be recognized is selected from the at least two frames of the first image to be recognized as the target image, and The first face region corresponding to the target image is also the target face region of the target image.
  • the number of frames of the first image to be recognized is 3 frames, which are the first video frame, the second video frame and the third video frame extracted from the video stream, and the third video frame can be selected as the target image, and the first face area of the third video frame is also the target face area of the target image.
  • Step 104 crop the target image to obtain a cropped image.
  • the first terminal cuts the target image according to the target face area to obtain a cropped image, and the cropped image includes the target face area. human face.
  • Step 105 Send the cropped image to a server, so that the server can identify the cropped image to determine the identity information of the target user.
  • the first terminal sends the cropped image to the server
  • the server receives the cropped image sent by the first terminal, and then the server extracts the first face feature point in the cropped image, and converts the cropped image to the server.
  • the first facial feature point is compared with the facial feature points stored in the server to determine the user ID corresponding to the target user, and then the server obtains the identity information of the target user corresponding to the user ID, thereby realizing the determination of the target user's identity information.
  • the first facial feature point may be at least one feature point in the human face, such as the feature points such as nose, left eye, right eye, mouth, etc.; the user identification is the same as the facial feature point of the target user stored in the server and the target user's facial feature point stored in the server.
  • Identity information association which is actually an index number, through which the facial feature points stored in the server are associated with the identity information stored in the server; the identity information of the target user includes the target user's name, age, gender, ID number, mobile phone number, occupation, education and other information.
  • the camera When the camera collects the video stream containing the face information of the target user, it can directly determine the identity information of the target user based on the cooperative use of the first terminal and the server.
  • the marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.
  • the cropped image is obtained by cropping the target image according to the target face area, and the cropped image is sent to the server to identify the identity information of the target user, instead of directly sending the target image to the server to identify the identity information,
  • the purpose is to reduce the amount of computing on the server and reduce the computing pressure on the server.
  • the extraction process of the target image and the cropping process of the cropped image are deployed on the first terminal, and the identification and comparison process of the first facial feature points and the query process of identity information are deployed on the server, so as to avoid integrating all functions in one device.
  • the bandwidth pressure caused by video streaming is large. By deploying some functions on the first terminal and deploying some functions on the server, the bandwidth pressure during video streaming can be effectively relieved, that is, the network bandwidth pressure can be reduced.
  • the first face area corresponding to the target image is determined as the target face area, and then according to the target face area
  • the target image is cropped in the area, and the cropped image is sent to the server for face recognition to determine the identity information of the target user.
  • the identity information of the target user can be known in time. The personnel can customize the marketing plan in time according to the user's identity information, improve the marketing efficiency, and improve the user's experience effect.
  • FIG. 2 a specific flowchart of an identity recognition method according to an embodiment of the present application is shown, which is applied to the first terminal and may specifically include the following steps:
  • Step 201 extract at least two frames of the first image to be identified from the video stream at every preset number of frames.
  • the camera collects the video stream in real time, and sends the collected video stream to the first terminal, and the first terminal can extract one frame of image from the video stream every preset number of frames to obtain multiple video frames, Then, at least two video frames are selected from the plurality of video frames as the first to-be-identified images, that is, at least two frames of the first to-be-identified images are extracted from the video stream.
  • any two adjacent frames of the first to-be-recognized images may be consecutive video frames, and any two adjacent frames of the first to-be-recognized images are actually in the video stream.
  • the upper interval preset number of frames may be consecutive video frames, and any two adjacent frames of the first to-be-recognized images are actually in the video stream.
  • extract 3 frames of the first image to be identified from the video stream which are the first video frame, the second video frame, and the third video frame in the video stream
  • the first video frame is the first video frame in the video stream.
  • the first frame image, the second video frame is the 11th frame image in the video stream
  • the third video frame is the 21st frame image in the video stream.
  • the first video frame frame, the second video frame, and the third video frame are consecutive, there are no other extracted video frames in between, and the two adjacent frames of the first to-be-recognized image are actually spaced apart in the video stream by the preset frame
  • the number is 10 frames.
  • Step 202 Perform compression processing on each frame of the first image to be identified, so that the size of the first image to be identified after compression is smaller than the size of the first image to be identified before compression.
  • the first terminal after extracting at least two frames of the first to-be-recognized image from the video stream, performs compression processing on each frame of the first to-be-recognized image, so that the compressed first to-be-recognized image is
  • the size is smaller than the size of the first to-be-recognized image before compression.
  • the first to-be-recognized image may be reduced proportionally.
  • the width of the first image to be recognized before compression is W and the height is H
  • the width of the size of the first image to be recognized after compression is W/3
  • the height is also H/3.
  • Step 203 Input the compressed first image to be recognized in each frame into the SSD model to obtain a first face region where the face in the first image to be recognized is located in each frame.
  • a pre-trained target detection model is stored in the first terminal, the target detection model is a neural network model, and the neural network model can be selected from SSD (Single Shot MultiBox Detector, click multi-box detection )Model.
  • the first terminal After compressing the first image to be recognized in each frame, the first terminal can input the compressed first image to be recognized in each frame into the SSD model, and the SSD model will output the first image to be recognized in each frame.
  • the location information corresponding to the first face area where the face of the The category corresponding to the face area.
  • the detection speed can be shortened.
  • the SSD model includes a backbone network, a multi-scale detection sub-network and an NMS (NonMaximum Suppression, non-maximum suppression) network that are connected in sequence.
  • the backbone network can be a deep convolutional neural network, such as a VGG16 network. Layers are convolutional layers connected in sequence and a classification network layer and a position regression network layer connected to each convolutional layer.
  • Input the compressed first image to be recognized into the SSD model first perform feature extraction on the first image to be recognized through the backbone network to obtain a feature image, and then input the feature image into a multi-layer convolution layer to obtain different scales and different
  • the category probability value (score) and its position offset (location) corresponding to the preselected box of the aspect ratio then, enter the category probability value, position offset and preselected box of all preselected boxes into the classification network layer and location
  • the classification and regression processing are carried out in the regression network layer; finally, each pre-selected box after classification and regression processing is input into the NMS network, the redundant pre-selected boxes are eliminated through the NMS network, and the pre-selected box with the highest confidence is selected as the first one.
  • the rectangular box corresponding to the face area.
  • Step 204 for the at least two frames of the first to-be-recognized images, respectively calculate the relationship between the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame of the first to-be-recognized image. The intersection ratio of the first face region.
  • the first face region corresponding to the first to-be-recognized image of each frame and the first to-be-recognized image corresponding to the adjacent previous frame are calculated respectively.
  • the intersection ratio refers to the ratio of the area of the overlapping area to the total area of the two first face areas in the two first face areas.
  • extract 3 frames of the first image to be identified from the video stream which are the 1st frame image in the video stream, the 11th frame image in the video stream, and the 21st frame image in the video stream, and the 3 frames of the first image
  • the first face area corresponding to the image to be recognized is the first face area 1
  • the first face area 2 and the first face area 3 respectively
  • the first face area 1 and the first face area 2 are calculated respectively.
  • the intersection ratio of and the intersection ratio of the first face area 2 and the first face area 3.
  • Step 205 when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to a first set threshold, select any frame of the first to-be-recognized image from the at least two frames of the first to-be-recognized images.
  • the image to be recognized is used as the target image, and the first face area corresponding to the target image is determined as the target face area.
  • the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to the first set threshold, it is determined that the first terminal has not mistakenly recognized the face in the first to-be-recognized image
  • the first face region where the detected face is located that is, the first face region where the detected face is located is accurate.
  • any frame of the first to-be-recognized image is selected from at least two frames of the first to-be-recognized image as the target image, and The first face region corresponding to the target image is determined as the target face region.
  • the last frame of the first to-be-recognized image is selected from at least two frames of the first to-be-recognized image as the target image;
  • the first set threshold can be set manually, for example, the first set threshold is 0.5.
  • the intersection ratio between the first face area 1 and the first face area 2 is 0.6
  • the intersection ratio between the first face area 2 and the first face area 3 is 0.8, which are all larger than the first set
  • the threshold value is 0.5
  • the first three frames of the first image to be recognized extracted from the video stream are respectively: the first frame image in the video stream, the 11th frame image in the video stream and the 21st frame image in the video stream, then select from The 21st frame image in the video stream is used as the target image
  • the first face area 3 corresponding to the 21st frame image in the video stream is used as the target face area.
  • Step 206 Detect the coordinate position corresponding to the first face key point in the target face area.
  • the first terminal after determining the target image and the target face area corresponding to the target image, the first terminal detects the first face key points in the target face area through a face detection algorithm, and determines the first face The coordinate position corresponding to the key point.
  • the first face key points include key points such as left eye, right eye, nose, left mouth corner, right mouth corner, etc.
  • the coordinate position corresponding to the first face key point includes the coordinate position of the left eye in the target image, the right eye
  • the calculation amount of the first terminal can be reduced.
  • Step 207 according to the coordinate position corresponding to the first face key point, intercept an area including the first face key point from the target image to obtain the cropped image.
  • the first terminal after detecting the coordinate position corresponding to the first face key point in the target face area, the first terminal intercepts the target image including the first face key point according to the coordinate position corresponding to the first face key point.
  • the area of key points of a face is obtained as a cropped image.
  • the target image is cropped according to the preset cropping size, so that the size of the cropped image is the preset cropping size, and the cropped image includes all the key points of the first face.
  • Step 208 Send the cropped image to a server, so that the server can identify the cropped image to determine the identity information of the target user.
  • step 105 The principle of this step is similar to that of step 105 in the above-mentioned first embodiment, and details are not repeated here.
  • step 208 it also includes the following steps: receiving the user identification of the target user sent by the server, and storing the user identification and the target face area corresponding to the user identification; Extracting N frames of the second image to be identified in sequence; the second image to be identified is a video frame located after the first image to be identified in the video stream, and N is a positive integer greater than 1; based on the target A detection model, which detects the second face area where the face in the second image to be recognized is located in each frame; calculates the target face area and the second face corresponding to the second image to be recognized in each frame The intersection ratio of the area; when the second face area corresponding to the second to-be-recognized image of N frames continuously extracted, there is a second face area whose intersection ratio with the target face area is greater than or equal to the second set threshold.
  • the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and subsequent steps are performed; when the consecutively extracted N frames of the second to-be-recognized image correspond to the second face region, When the intersection ratio with the target face area is less than the second set threshold, delete the user ID and the target face area, and re-execute the first step of extracting at least one frame from the video stream. Image to be recognized and subsequent steps.
  • the server will extract the first facial feature point in the cropped image, and compare the first facial feature point with the facial feature point stored in the server Make a comparison to determine the user ID corresponding to the target user, and then, the server will send the user ID of the target user to the first terminal; the first terminal receives the user ID of the target user sent by the server, and sends the user ID to the target user
  • the face area is cached.
  • the first terminal Since the first terminal will acquire the video stream captured by the camera in real time, and split the video stream into multiple video frames as required, the video frame located after the first image to be recognized in the video stream is called the second to-be-recognized image. image, which can be implemented to sequentially extract N frames of the second image to be recognized from the video stream.
  • the first image to be recognized includes a first video frame, a second video frame, and a third video frame
  • the first video frame is the first frame image in the video stream
  • the second video frame is the video stream
  • the third video frame is the 21st frame image in the video stream, therefore, the video frame after the third video frame is called the second image to be recognized, such as the fourth video frame
  • the first video frame The five video frames and the sixth video frame are the second images to be identified
  • the fourth video frame is the 31st frame image in the video stream
  • the fifth video frame is the 41st frame image in the video stream
  • the fourth video frame is the 31st frame image in the video stream.
  • Six video frames are the 51st frame image in the video stream.
  • the first terminal After extracting the second to-be-recognized image, the first terminal inputs each frame of the second to-be-recognized image into the target detection model to obtain a second face region where the face in each frame of the second to-be-recognized image is located.
  • the target detection model may also be an SSD model, and the detection process of the second face region in the second to-be-recognized image is similar to the detection process of the first human-face region in the first to-be-recognized image, which will not be repeated here.
  • the first terminal calculates the intersection ratio between the cached target face region and the second face region corresponding to the second to-be-recognized image of each frame.
  • the target face region is determined.
  • the target user corresponding to the area has not missed tracking.
  • the intersection ratio of the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted and the target face region is smaller than the second set threshold, it is determined that the target user corresponding to the target face region has missed tracking , at this time, delete the cached user ID and the target face area, and re-execute step S201 and subsequent steps, that is, re-extract the first image to be recognized from the video stream, and detect where the face in the first image to be recognized is located.
  • the first face area is selected from the target image and the target face area, and then the target image is cropped, and the cropped image is sent to the server to identify the identity information of the target user.
  • the second set threshold and the first set threshold may be equal or unequal; the number of frames N of the second to-be-recognized images continuously extracted, and the specific value of N can be manually set according to the actual situation, such as setting N for 20 frames.
  • the server after determining the user identifier of the target user, the server sends the user identifier to the first terminal, and the first terminal stores the user identifier and the target face area, and based on the target face area, analyzes subsequent slave video streams from the video stream.
  • the extracted second to-be-recognized image is judged to realize the tracking of the target user.
  • the first terminal will not continue to crop the second to-be-recognized image, nor will the cropped image be sent to the server.
  • the terminal In order to identify the identity information of the target user again, it is avoided that the terminal sends all the cropped images to the server, resulting in a substantial increase in the computing pressure of the server.
  • the speed of identifying the identity information of each target user is correspondingly improved; and when the target user is missing tracking, the identity information of the target user is re-detected through the first terminal and the server.
  • At least two frames of the first to-be-recognized image are extracted from the video stream, and the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame of the first to-be-recognized image are extracted.
  • select the target image from at least two frames of the first to-be-recognized image and assign the first face region corresponding to the target image.
  • Determine the target face area then identify the coordinate position corresponding to the first face key point in the target face area, crop the target image according to the coordinate position corresponding to the first face key point, and send the cropped image to the server for processing.
  • Face recognition determine the identity information of the target user, when the collected video stream contains the face of the target user, the identity information of the target user can be known in time, and the staff can customize the marketing plan in time according to the identity information of the user , improve the efficiency of marketing, and improve the user experience effect; and only when the intersection ratio corresponding to at least two frames of the first to-be-recognized image is greater than or equal to the first set threshold, the target image is selected from it, avoiding the target detection model.
  • the area where other objects are located is mistakenly determined as the first face area, resulting in the problem that the identity information of the target user cannot be detected in the future.
  • FIG. 3 a flowchart of another identity recognition method according to an embodiment of the present application is shown, which is applied to a server and may specifically include the following steps:
  • Step 301 Receive a cropped image sent by a first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region.
  • the first terminal acquires the video stream collected by the camera in real time, extracts at least one frame of the first image to be recognized from the video stream, and then detects the person in each frame of the first image to be recognized based on the target detection model
  • the target image is selected from the first images to be recognized in each frame, and the first face area corresponding to the target image is determined as the target face area, and then the target image is processed according to the target face area. Crop, get the cropped image, and finally, send the cropped image to the server.
  • the server receives the cropped image sent by the first terminal, and the cropped image is obtained by the first terminal after cropping the target image according to the target face area.
  • the cropped image includes the target face area in the target image. The key point of the first face.
  • Step 302 extracting the first face feature point in the cropped image.
  • the server after receiving the cropped image sent by the first terminal, extracts a first facial feature point in the cropped image, and the first facial feature point may be at least one feature point in a human face, such as a nose , left eye, right eye, mouth and other feature points.
  • Step 303 Compare the first face feature point with the face feature point stored in the server to determine the user identifier corresponding to the target user.
  • a face feature database and a user identity information database are set in the server, the face feature database stores the face feature points of each user, the user identity information database stores the identity information of each user, and the server stores the face feature points of each user.
  • User IDs of each user are stored, and the user IDs are in one-to-one correspondence with the user's facial feature points stored in the facial feature database and the user's identity information stored in the user identity information database.
  • the server After extracting the first face feature point in the cropped image, the server compares the first face feature point with each face feature point stored in the face feature database. When the similarity of the target face feature points stored in the library is greater than the similarity threshold, it is determined that the first face feature point matches the target face feature point, and then the user ID corresponding to the target face feature point is queried. The ID is the user ID of the target user corresponding to the cropped image.
  • Step 304 Obtain the identity information of the target user corresponding to the user identifier.
  • the server obtains the user identifier corresponding to the target user, because the user identifier is the same as the user's face feature points stored in the face feature database and the user's identity information stored in the user identity information database One-to-one correspondence, therefore, the identity information of the target user can be queried from the user identity information database according to the user identity of the target user.
  • the identity information of the target user includes the target user's name, age, gender, ID card number, mobile phone number, occupation, education background and other information.
  • the server may count the visiting time and the number of visits of the target user based on the identification time and identification times of the identity information of each target user.
  • the method further includes: sending the identity information of the target user to a second terminal, so as to remind the target user of visiting through the second terminal.
  • the server after identifying the identity information of the target user, sends the identity information of the target user to the second terminal, and the second terminal receives the identity information of the target user sent by the server, and passes the information on the display screen of the second terminal.
  • the identity information is displayed to remind the relevant staff of the target user's visit.
  • the staff can view the identity information of the target user through the second terminal, and formulate a marketing plan for the target user in time to improve the efficiency of marketing.
  • the second terminal may be a display screen deployed in a corresponding occasion, such as a display screen deployed in a bank business hall, and the second terminal may also be a terminal device designated by relevant staff, such as a mobile phone held by a product manager, Terminal equipment such as computers.
  • step 303 also includes: sending the user identification of the target user to the first terminal, so as to calculate the second person corresponding to the N frames of the second to-be-identified images continuously extracted through the first terminal The intersection ratio between the face area and the target face area, so as to realize the tracking of the target user; the N is a positive integer greater than 1.
  • the server after determining the user ID corresponding to the target user, the server sends the user ID of the target user to the first terminal, and the first terminal receives the user ID of the target user sent by the server, and sends the user ID and the user ID to the first terminal.
  • the corresponding target face area is cached.
  • the first terminal sequentially extracts N frames of the second to-be-recognized image from the video stream, where the second to-be-recognized image is a video frame located after the first to-be-recognized image in the video stream, and then, based on the target detection model, detects each The second face area where the face in the second frame of the image to be identified is located, and the intersection ratio between the target face area and the second face area corresponding to each frame of the second image to be identified is calculated.
  • the intersection ratio of the second face region corresponding to the N frames of the second to-be-recognized image and the target face region it is determined whether the target user has missed tracking. Specifically, when there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted, determine The target user corresponding to the target face area has no missed tracking; when the intersection ratio of the second face area corresponding to the N frames of the second to-be-recognized image continuously extracted and the target face area is smaller than the second set threshold, It is determined that the target user corresponding to the target face area has missed tracking.
  • the target user does not miss tracking, continue to perform the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and the subsequent steps, and in the case of missing tracking of the target user, delete the cached user ID and target face area, and re-execute the steps of extracting the first image to be recognized from the video stream and the subsequent steps.
  • Tracking the target user through the first terminal can reduce the calculation pressure of the terminal and the server, and correspondingly improve the identification speed of the identity information of each target user.
  • FIG. 4 a flowchart of the registration process of the facial feature points and identity information of the registered user in the embodiment of the present application is shown, which may specifically include the following steps:
  • Step 401 Receive the face image and the identity information of the registered user sent by the third terminal.
  • the registered user can input the face image and the identity information of the registered user on the third terminal, and then the third terminal sends the face image and the identity information of the registered user to the server, and the server receives the transmission from the third terminal. face images and the identity information of registered users.
  • the face image sent by the third terminal may be collected in real time by the third terminal, or may be pre-stored on the third terminal; and, the third terminal may be a terminal device deployed in a corresponding occasion, such as in a bank business hall
  • the deployed terminal that can collect face images, and the third terminal may also be a terminal device held by a registered user, such as a mobile phone held by a registered user.
  • Step 402 based on the target detection model, detect a third face region where the face in the face image is located.
  • a pre-trained target detection model is also stored in the server, and the target detection model may be an SSD model.
  • the server After receiving the face image and the identity information of the registered user sent by the third terminal, the server inputs the face image into the target detection model, and the target detection model outputs the third face area where the face in the face image is located.
  • the third face area is actually a rectangular frame area, and the detected third face area is also represented by the position information of the third face area.
  • Step 403 Extract the second face feature points in the third face region.
  • the server after detecting the third face region where the face in the face image is located, extracts a second face feature point in the third face region, and the second face feature point may be a human face At least one feature point in the image, such as nose, left eye, right eye, mouth and other feature points.
  • Step 404 Compare the second face feature point with the face feature point stored in the server to determine whether the second face feature point is stored in the server.
  • the server compares the second face feature point with the face feature point stored in the server, that is, determines the second face feature point Whether the feature point matches any face feature point stored in the server, so as to determine whether there is a second face feature point stored in the server, that is, to determine whether the registered user has already registered.
  • Step 405 when the second face feature point is stored in the server, return registration error information to the third terminal.
  • the server when the second facial feature point matches one of the facial feature points stored in the server, it is determined that the second facial feature point is stored in the server. Correspondingly, it is also determined that the registered user has previously After registration, at this time, the server returns a registration error message to the third terminal, reminding the registered user that the registration has been done before.
  • the registration error information, the identity information of the registered user and the server can also be sent to the terminal device corresponding to the staff member.
  • the identity information corresponding to the facial feature points that match the second facial feature points in the data center is used to remind the staff to deal with it in time, such as manually registering the registered user.
  • Step 406 when the second face feature point is not stored in the server, save the second face feature point and the identity information of the registered user, and generate a point the user ID associated with the registered user's identity information.
  • the server when the second facial feature point does not match each facial feature point stored in the server, it is determined that there is no second facial feature point stored in the server, and then the server saves the second facial feature point and the identity information of the registered user, specifically, storing the second facial feature points in the facial feature database, storing the identity information of the registered user in the user identity information database, and generating the second facial feature The user ID associated with the registered user's identity information.
  • first terminal the second terminal and the third terminal are not the same terminal, and the first terminal is actually a development board fixed on the back end of the camera, such as the rk3399 development board.
  • step 402 it also includes: based on the face occlusion model, determining whether the face in the third face area is occluded; when the face in the third face area is not occluded, Detecting the coordinate position corresponding to the second face key point in the third face area; determining whether the face in the third face area is a frontal face according to the coordinate position corresponding to the second face key point ; When the human face in the third human face area is a frontal face, perform the steps of extracting the second human face feature point in the third human face area and subsequent steps.
  • the server prestores a trained face occlusion model
  • the face occlusion model is a neural network model. After detecting the third face area where the face in the face image is located, the server inputs the face image to the face occlusion model, and the face occlusion model outputs a corresponding result, which represents the third person in the face image Whether the face in the face area is occluded.
  • the server detects the second face key point in the third face area, and determines the coordinate position corresponding to the second face key point, and the second face key point includes If the key points such as left eye, right eye, nose, left corner of mouth, right corner of mouth, etc., the coordinate position corresponding to the second face key point includes the coordinate position of the left eye in the face image and the coordinate position of the right eye in the face image. , the coordinate position of the nose in the face image, the coordinate position of the left mouth corner in the face image, and the coordinate position of the right mouth corner in the face image, etc.
  • the server determines whether the face in the third face area is a frontal face according to the coordinate position corresponding to the key point of the second face. Specifically, according to the distance between the coordinate positions of any two second face key points, it is judged whether the face in the third face area is a frontal face, when the coordinate position of any two second face key points is between When the distances between the two are all within their corresponding preset distance ranges, it is determined that the face in the third face area is a frontal face. If there is a distance between the coordinate positions of two second face key points, the When the distance range is preset, it is determined that the face in the third face area is not a frontal face.
  • the distance between the left eye and the right eye is L1
  • the corresponding preset distance range between the left eye and the right eye is [L2, L3]
  • the distance L1 between the left eye and the right eye is not located at the preset distance Within the range [L2, L3] it is determined that the face in the third face area is not a frontal face.
  • the server determines that the face in the third face area is a frontal face, it performs the steps of extracting the second face feature point in the third face area and the subsequent steps, ie, steps 403 to 406 are performed.
  • the face in the third face area is blocked, or the face in the third face area is not a frontal face, it means that the face image sent by the third terminal does not meet the requirements, the whole execution process ends, and the server Registration will not be performed based on the face image sent by the third terminal and the identity information of the registered user.
  • the cropped image by extracting the first face feature point in the cropped image, comparing the first face feature point with the face feature point stored in the server, determining the user ID corresponding to the target user, and then obtaining the user ID
  • the identity information of the target user when the collected video stream contains the face of the target user, the cropped image also contains the face of the target user.
  • the identity information of the target user can be known in time.
  • the marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.
  • FIG. 5 a structural block diagram of a terminal according to an embodiment of the present application is shown.
  • the terminal 500 provided in this embodiment of the present application is a first terminal, and the terminal 500 includes:
  • the first to-be-recognized image extraction module 501 is configured to extract at least one frame of the first to-be-recognized image from the video stream;
  • the first face region detection module 502 is configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model;
  • the target face area determination module 503 is configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;
  • the target image cropping module 504 is configured to crop the target image according to the target face region to obtain a cropped image
  • the cropped image sending module 505 is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
  • the first to-be-recognized image extraction module 501 includes:
  • the first to-be-recognized image extraction submodule is configured to extract at least two frames of the first to-be-recognized image from the video stream at a preset number of frames per interval;
  • the target face area determination module 503 includes:
  • the intersection ratio calculation sub-module is configured to, for the at least two frames of the first to-be-recognized image, respectively calculate the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame The intersection ratio of the first face region corresponding to the first to-be-recognized image;
  • the target face area determination sub-module is configured to, when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to a first set threshold, extract the data from the at least two frames of the first to-be-recognized images Select any frame of the first image to be recognized as the target image, and determine the first face region corresponding to the target image as the target face region.
  • the target image cropping module 504 includes:
  • a coordinate position detection submodule configured to detect the coordinate position corresponding to the first face key point in the target face area
  • the target image cropping sub-module is configured to intercept an area including the first facial key point from the target image according to the coordinate position corresponding to the first facial key point to obtain the cropped image.
  • the terminal 500 further includes:
  • a user identification receiving module configured to receive the user identification of the target user sent by the server, and store the user identification and the target face area corresponding to the user identification;
  • the second to-be-recognized image extraction module is configured to sequentially extract N frames of the second to-be-recognized image from the video stream; the second to-be-recognized image is located after the first to-be-recognized image in the video stream Video frame, the N is a positive integer greater than 1;
  • the second face region detection module is configured to detect, based on the target detection model, the second face region where the face in the second to-be-recognized image of each frame is located;
  • an intersection ratio calculation module configured to calculate the intersection ratio between the target face region and the second face region corresponding to the second to-be-recognized image of each frame
  • the user identification deletion module is configured to when the second face region corresponding to the second to-be-recognized image of N frames continuously extracted and the intersection ratio of the target face region and the target face region are all smaller than the second set threshold , delete the user ID and the target face region, and execute the first image extraction module 501 to be recognized again.
  • the first face region detection module 502 includes:
  • the first to-be-recognized image compression sub-module is configured to perform compression processing on each frame of the first to-be-recognized image, so that the size of the compressed first to-be-recognized image is smaller than the size of the first to-be-recognized image before compression the size of the image;
  • the first face area detection sub-module is configured to input the compressed first image to be recognized in each frame into the SSD model, and obtain the first image where the face in each frame of the first image to be recognized is located. face area.
  • the first face area corresponding to the target image is determined as the target face area, and then according to the target face area
  • the target image is cropped in the area, and the cropped image is sent to the server for face recognition to determine the identity information of the target user.
  • the identity information of the target user can be known in time. The personnel can customize the marketing plan in time according to the user's identity information, improve the marketing efficiency, and improve the user's experience effect.
  • FIG. 6 a structural block diagram of a server according to an embodiment of the present application is shown.
  • the server 600 provided by the embodiment of the present application includes:
  • the cropped image receiving module 601 is configured to receive the cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face area;
  • the first face feature point extraction module 602 is configured to extract the first face feature point in the cropped image
  • the user identification determination module 603 is configured to compare the first facial feature point with the facial feature point stored in the server, and determine the user identification corresponding to the target user;
  • the identity information obtaining module 604 is configured to obtain the identity information of the target user corresponding to the user identification.
  • the server 600 further includes:
  • the identity information sending module is configured to send the identity information of the target user to a second terminal, so as to remind the target user of visiting through the second terminal.
  • the server 600 further includes:
  • the user identification sending module is configured to send the user identification of the target user to the first terminal, so as to calculate the second face area corresponding to the N frames of the second to-be-recognized images continuously extracted by the first terminal and The intersection ratio of the target face area, so as to realize the tracking of the target user; the N is a positive integer greater than 1.
  • the server 600 further includes:
  • a face image receiving module configured to receive the face image and the identity information of the registered user sent by the third terminal
  • the third face area detection module is configured to detect the third face area where the face in the face image is located based on the target detection model
  • the second face feature point extraction module is configured to extract the second face feature point in the third face region
  • a face feature point comparison module configured to compare the second face feature point with the face feature point stored in the server, and determine whether the second face feature point is stored in the server;
  • a registration error information return module is configured to return registration error information to the third terminal when the second face feature point is stored in the server;
  • the second face feature point storage module is configured to save the second face feature point and the identity information of the registered user when the second face feature point is not stored in the server, and generate User identifiers respectively associated with the second facial feature points and the identity information of the registered user.
  • the server 600 further includes:
  • an occlusion judgment module configured to determine whether the face in the third face region is occluded based on the face occlusion model
  • a coordinate position detection module configured to detect the coordinate position corresponding to the key point of the second face in the third face area when the face in the third face area is not blocked
  • a front face judgment module configured to determine whether the face in the third face area is a front face according to the coordinate position corresponding to the second face key point;
  • the second face feature point extraction module is executed.
  • the cropped image by extracting the first face feature point in the cropped image, comparing the first face feature point with the face feature point stored in the server, determining the user ID corresponding to the target user, and then obtaining the user ID
  • the identity information of the target user when the collected video stream contains the face of the target user, the cropped image also contains the face of the target user.
  • the identity information of the target user can be known in time.
  • the marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.
  • an embodiment of the present application further provides a terminal, including a processor, a memory, and a computer program stored on the memory and running on the processor, when the computer program is executed by the processor Steps for implementing the above-mentioned terminal-side identification method.
  • Embodiments of the present application further provide a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned terminal-side identification method are implemented.
  • an embodiment of the present application further provides a server, including a processor, a memory, and a computer program stored on the memory and running on the processor, when the computer program is executed by the processor.
  • Embodiments of the present application further provide a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the foregoing server-side identification method are implemented.
  • FIG. 7 a structural diagram of an identity recognition system according to an embodiment of the present application is shown.
  • An embodiment of the present application further provides an identity recognition system, including a camera 701 , a second terminal 702 , a third terminal 703 , the above-mentioned first terminal 500 and the above-mentioned server 600 .
  • the camera 701 is configured to capture the video stream and send the video stream to the first terminal 500;
  • the second terminal 702 is configured to receive the identity information of the target user sent by the server 600, so as to remind the target user of the visit;
  • the third terminal 703 is configured to send the face image and the identity information of the registered user to the server 600, and receive the registration error information returned by the server 600 when the second face feature point included in the face image is stored in the server 600.
  • the camera 701 sends the video stream to the first terminal 500, the first terminal extracts the video frame from the video stream, and selects the first image to be recognized and the second image to be recognized from the video frame.
  • the video frame located after the first to-be-recognized image is the second to-be-recognized image.
  • the first terminal 500 inputs the first to-be-recognized image into the target detection model, obtains the first face region where the face in each frame of the first to-be-recognized image is located, selects the target image from each frame of the first to-be-recognized image, and Determine the first face area corresponding to the target image as the target face area; then, detect the first face key point in the target face area, and determine the coordinate position of the first face key point, according to the first face The coordinate position corresponding to the key point is intercepted from the target image including the area of the first face key point to obtain a cropped image.
  • the first terminal 500 sends the cropped image to the server 600, and the server 600 extracts the first facial feature point in the cropped image, and compares the first facial feature point with the facial feature point stored in the facial feature database to determine The user identification corresponding to the target user; the server 600 queries the identification information of the target user from the user identification information database according to the user identification, and sends the identification information of the target user to the second terminal 702, so as to carry out the identification of the target user through the second terminal 702. Visit reminder.
  • the server 600 will also send the user identification corresponding to the target user to the first terminal 500, the first terminal 500 will cache the user identification and the target face area corresponding to the user identification, and the first terminal 500 will also store the second pending identification.
  • the image is input to the target detection model to obtain the second face region where the face in the second to-be-recognized image is located, and then the first terminal 500 calculates the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted and the target The intersection ratio of the face area, so as to realize the tracking of the target user.
  • the registered user can also send the face image and the identity information of the registered user to the server 600 through the third terminal 703 to register the user information.
  • the server 600 sends the second face feature point in the face image to the server 600 Store in the face feature database, and store the registered user's identity information in the user identity information database.
  • the camera can be deployed in a bank business hall or other occasions where identification of the target user's identity information is required.
  • the first face area corresponding to the target image is determined as the target face area, and then according to the target face area
  • the target image is cropped in the region, and the cropped image is sent to the server.
  • the server extracts the first facial feature point in the cropped image, compares the first facial feature point with the facial feature points stored in the server, and determines the corresponding target user. and then obtain the identity information of the target user corresponding to the user ID.
  • the cropped image also contains the face of the target user accordingly, and the target can be known in time based on the cropped image.
  • the staff can customize the marketing plan in time according to the user's identity information, improve the efficiency of marketing, and improve the user's experience effect.
  • the device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
  • Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the computing processing device according to the embodiments of the present application.
  • DSP digital signal processor
  • the present application can also be implemented as an apparatus or apparatus program (eg, computer programs and computer program products) for performing part or all of the methods described herein.
  • Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.
  • FIG. 8 shows a computing processing device, such as the aforementioned server 600 or terminal 500, that can implement the method according to the present application.
  • the computing processing device traditionally includes a processor 810 and a computer program product or computer readable medium in the form of a memory 820 .
  • the memory 820 may be electronic memory such as flash memory, EEPROM (electrically erasable programmable read only memory), EPROM, hard disk, or ROM.
  • the memory 820 has storage space 830 for program code 831 for performing any of the method steps in the above-described methods.
  • storage space 830 for program code may include various program codes 831 for implementing various steps in the above methods, respectively. These program codes can be read from or written to one or more computer program products.
  • These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as described with reference to FIG. 9 .
  • the storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 820 in the computing processing device of FIG. 8 .
  • the program code may, for example, be compressed in a suitable form.
  • the storage unit includes computer readable code 831', ie code readable by a processor such as 810 for example, which when executed by a computing processing device, causes the computing processing device to perform any of the methods described above. of the various steps.
  • any reference signs placed between parentheses shall not be construed as limiting the claim.
  • the word “comprising” does not exclude the presence of elements or steps not listed in a claim.
  • the word “a” or “an” preceding an element does not exclude the presence of a plurality of such elements.
  • the application can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware.
  • the use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present application relates to the technical field of computers, and provides an identity recognition method, a terminal, a server and a system. According to the present application, at least one frame of a first image to be recognized is extracted from a video stream, and a target image is selected from said image; a first face region corresponding to the target image is determined as a target face region; then, according to the target face region, the target image is cropped; and the cropped image is sent to a server for face recognition to determine identity information of a target user. When a collected video stream comprises the face of a target user, identity information of the target user can be known in a timely manner, and thus, a clerk can customize a marketing plan in a timely manner according to the identity information of the user, thereby increasing marketing efficiency and improving user experience.

Description

一种身份识别方法、终端、服务器及系统An identification method, terminal, server and system 技术领域technical field
本申请涉及计算机技术领域,特别是涉及一种身份识别方法、终端、服务器及系统。The present application relates to the field of computer technology, and in particular, to an identification method, terminal, server and system.
背景技术Background technique
在人们的日常生活和工作时,往往有时候需要去银行营业厅等场合办理相关的业务。In people's daily life and work, it is often necessary to go to a bank business hall and other occasions to handle related business.
目前,在业务办理前,用户需要在自动取号机上抽取相应的号码,并按照号码顺序排队去对应的柜台办理业务,用户在柜台办理相关的业务时,会提供相关身份文件,此时,银行的工作人员才可了解到该用户的身份信息。例如,针对银行的VIP用户,也只有在VIP用户去柜台办理业务时,才能确定VIP用户的身份信息。At present, before the business is processed, the user needs to extract the corresponding number from the automatic number dispenser, and queue up to the corresponding counter to handle the business in the order of the number. When the user handles the related business at the counter, the relevant identity documents will be provided. Only the staff of , can know the identity information of the user. For example, for VIP users of a bank, the identity information of the VIP users can only be determined when the VIP users go to the counter to conduct business.
然而,目前通过用户抽号排队再去柜台办理业务的方式,导致工作人员无法及时了解用户的身份信息,因此,也就无法针对用户的身份信息及时定制对用户的营销方案,导致用户的体验较差。However, at present, the way that users queue up and then go to the counter to handle the business, the staff cannot know the user's identity information in time. Therefore, it is impossible to customize the marketing plan for the user in time according to the user's identity information, resulting in a poor user experience. Difference.
概述Overview
本申请一些实施例提供了如下技术方案:Some embodiments of the present application provide the following technical solutions:
第一方面,提供了一种身份识别方法,应用于第一终端,所述方法包括:In a first aspect, an identification method is provided, which is applied to a first terminal, and the method includes:
从视频流中提取至少一帧第一待识别图像;extracting at least one frame of the first image to be identified from the video stream;
基于目标检测模型,检测每一帧所述第一待识别图像中的人脸所在的第一人脸区域;Based on the target detection model, detect the first face region where the face in the first to-be-recognized image of each frame is located;
从各帧所述第一待识别图像中选取目标图像,并将所述目标图像对应的第一人脸区域确定为目标人脸区域;Select a target image from the first to-be-recognized images of each frame, and determine the first face region corresponding to the target image as the target face region;
根据所述目标人脸区域,对所述目标图像进行裁剪,得到裁剪图像;According to the target face region, the target image is cropped to obtain a cropped image;
将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用户的身份信息。Sending the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
第二方面,提供了一种身份识别方法,应用于服务器,所述方法包括:In a second aspect, an identification method is provided, applied to a server, the method includes:
接收第一终端发送的裁剪图像;所述裁剪图像是所述第一终端根据目标人脸区域对目标图像进行裁剪后得到的;receiving the cropped image sent by the first terminal; the cropped image is obtained by the first terminal cropping the target image according to the target face region;
提取所述裁剪图像中的第一人脸特征点;extracting the first face feature point in the cropped image;
将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识;Comparing the first facial feature point with the facial feature point stored in the server to determine the user identifier corresponding to the target user;
获取所述用户标识对应的目标用户的身份信息。Acquire the identity information of the target user corresponding to the user identifier.
第三方面,提供了一种终端,所述终端为第一终端,所述第一终端包括:A third aspect provides a terminal, where the terminal is a first terminal, and the first terminal includes:
第一待识别图像提取模块,被配置为从视频流中提取至少一帧第一待识别图像;a first to-be-recognized image extraction module, configured to extract at least one frame of the first to-be-recognized image from the video stream;
第一人脸区域检测模块,被配置为基于目标检测模型,检测每一帧所述第一待识别图像中的人脸所在的第一人脸区域;a first face region detection module, configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model;
目标人脸区域确定模块,被配置为从各帧所述第一待识别图像中选取目标图像,并将所述目标图像对应的第一人脸区域确定为目标人脸区域;a target face area determination module, configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;
目标图像裁剪模块,被配置为根据所述目标人脸区域,对所述目标图像进行裁剪,得到裁剪图像;a target image cropping module, configured to crop the target image according to the target face region to obtain a cropped image;
裁剪图像发送模块,被配置为将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用户的身份信息。The cropped image sending module is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
第四方面,提供了一种服务器,包括:In a fourth aspect, a server is provided, including:
裁剪图像接收模块,被配置为接收第一终端发送的裁剪图像;所述裁剪图像是所述第一终端根据目标人脸区域对目标图像进行裁剪后得到的;a cropped image receiving module, configured to receive a cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region;
第一人脸特征点提取模块,被配置为提取所述裁剪图像中的第一人脸特征点;a first face feature point extraction module, configured to extract the first face feature point in the cropped image;
用户标识确定模块,被配置为将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识;a user identification determining module, configured to compare the first facial feature points with the facial feature points stored in the server, and determine the user identification corresponding to the target user;
身份信息获取模块,被配置为获取所述用户标识对应的目标用户的身份信息。The identity information acquisition module is configured to acquire the identity information of the target user corresponding to the user identification.
第五方面,提供了一种终端,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上述的身份识别方法的步骤。In a fifth aspect, a terminal is provided, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing the above-mentioned identity when executed by the processor Identify the steps of the method.
第六方面,提供了一种计算机可读介质,所述计算机可读介质上存储计算机程序,所述计算机程序被处理器执行时实现上述的身份识别方法的步骤。In a sixth aspect, a computer-readable medium is provided, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned identification method are implemented.
第七方面,提供了一种服务器,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上述的身份识别方法的步骤。In a seventh aspect, a server is provided, comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program implementing the above-mentioned identity when executed by the processor Identify the steps of the method.
第八方面,提供了一种计算机可读介质,所述计算机可读介质上存储计算机程序,所述计算机程序被处理器执行时实现上述的身份识别方法的步骤。In an eighth aspect, a computer-readable medium is provided, and a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned identification method are implemented.
第九方面,提供了一种身份识别系统,包括摄像头、第二终端、第三终端,上述的第一终端以及上述的服务器;In a ninth aspect, an identification system is provided, including a camera, a second terminal, a third terminal, the above-mentioned first terminal and the above-mentioned server;
其中,所述摄像头,被配置采集视频流,并将所述视频流发送至所述第一终端;Wherein, the camera is configured to capture a video stream and send the video stream to the first terminal;
所述第二终端,被配置为接收所述服务器发送的目标用户的身份信息,以进行所述目标用户的到访提醒;The second terminal is configured to receive the identity information of the target user sent by the server, so as to remind the target user of the visit;
所述第三终端,被配置为向所述服务器发送人脸图像和注册用户的身份信息,以及当所述服务器内存储有所述人脸图像包含的第二人脸特征点时,接收所述服务器返回的注册错误信息。The third terminal is configured to send the face image and the identity information of the registered user to the server, and when the second face feature point included in the face image is stored in the server, receive the The registration error message returned by the server.
上述说明仅是本申请技术方案的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。The above description is only an overview of the technical solution of the present application, in order to be able to understand the technical means of the present application more clearly, it can be implemented according to the content of the description, and in order to make the above and other purposes, features and advantages of the present application more obvious and easy to understand , and the specific embodiments of the present application are listed below.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the accompanying drawings used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.
图1示意性地示出了本申请实施例的一种身份识别方法的流程图;FIG. 1 schematically shows a flowchart of an identity recognition method according to an embodiment of the present application;
图2示意性地示出了本申请实施例的一种身份识别方法的具体流程图;FIG. 2 schematically shows a specific flowchart of an identity recognition method according to an embodiment of the present application;
图3示意性地示出了本申请实施例的另一种身份识别方法的流程图;FIG. 3 schematically shows a flowchart of another identity identification method according to an embodiment of the present application;
图4示意性地示出了本申请实施例中注册用户的人脸特征点和身份信息的注册过程的流程图;FIG. 4 schematically shows a flowchart of a registration process of facial feature points and identity information of registered users in the embodiment of the present application;
图5示意性地示出了本申请实施例的一种终端的结构框图;FIG. 5 schematically shows a structural block diagram of a terminal according to an embodiment of the present application;
图6示意性地示出了本申请实施例的一种服务器的结构框图;FIG. 6 schematically shows a structural block diagram of a server according to an embodiment of the present application;
图7示意性地示出了本申请实施例的一种身份识别系统的结构图;FIG. 7 schematically shows a structural diagram of an identity recognition system according to an embodiment of the present application;
图8示意性地示出了用于执行根据本申请的方法的计算处理设备的框图;以及Figure 8 schematically shows a block diagram of a computing processing device for performing the method according to the present application; and
图9示意性地示出了用于保持或者携带实现根据本申请的方法的程序代码的存储单元。Figure 9 schematically shows a memory unit for holding or carrying program code implementing the method according to the application.
具体实施例specific embodiment
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
参照图1,示出了本申请实施例的一种身份识别方法的流程图,应用于第一终端,具体可以包括如下步骤:Referring to FIG. 1, a flowchart of an identity recognition method according to an embodiment of the present application is shown, which is applied to a first terminal and may specifically include the following steps:
步骤101,从视频流中提取至少一帧第一待识别图像。Step 101: Extract at least one frame of a first image to be recognized from a video stream.
在本申请实施例中,摄像头实时采集视频流,并将采集的视频流发送至第一终端,则第一终端按照多线程方式实时读取摄像头采集的视频流,并按照预设提取方式从视频流中提取视频帧。In the embodiment of the present application, the camera collects the video stream in real time, and sends the collected video stream to the first terminal, then the first terminal reads the video stream collected by the camera in real time in a multi-threaded manner, and extracts the video stream from the Extract video frames from the stream.
具体的,可以将视频流中的每一帧图像都提取出来,得到多个视频帧,也可以每间隔预设帧数从视频流中提取一帧图像,得到多个视频帧,如每间隔10帧从视频流中提取一帧图像或每间隔5帧从视频流中提取一帧图像等。Specifically, each frame of image in the video stream can be extracted to obtain multiple video frames, or one frame of image can be extracted from the video stream every preset number of frames to obtain multiple video frames, such as every 10 frames. Frame extracts a frame of image from the video stream or extracts a frame of image from the video stream every 5 frames, etc.
接着,再从得到的多个视频帧中选取至少一帧第一待识别图像,具体的,可以仅从多个视频帧中选取一个视频帧作为第一待识别图像,此时,从视频流中提取的第一待识别图像的帧数为一帧;也可以从多个视频帧中选取至少两个视频帧作为第一待识别图像,此时,从视频流中提取的第一待识别图像的帧数为至少两帧。Next, at least one frame of the first image to be identified is selected from the obtained multiple video frames. Specifically, only one video frame may be selected from the multiple video frames as the first image to be identified. At this time, from the video stream The number of frames of the first to-be-recognized image extracted is one frame; at least two video frames may also be selected from multiple video frames as the first to-be-recognized image, at this time, the first to-be-recognized image extracted from the video stream is The number of frames is at least two frames.
例如,视频流的时长为1s,每一秒视频流中包含的图像的帧数为60帧,并且,每间隔10帧从视频流中提取一帧图像,则可以从该视频流中提取到6个视频帧,第一个视频帧为视频流中的第1帧图像,第二个视频帧为视频流中的第11帧图像,第三个视频帧为视频流中的第21帧图像,第四个视频帧为视频流中的第31帧图像,第五个视频帧为视频流中的第41帧图像,第六个视频帧为视频流中的第51帧图像。并且,可选取第一个视频帧作为第一待识别图像,此时,第一待识别图像的帧数为一帧;或者,可选取第一个视频帧、第二个视频帧和第三个视频帧作为第一待识别图像,此时,第一待识别图像的帧数为3帧。For example, if the duration of the video stream is 1s, the number of frames of images contained in the video stream per second is 60 frames, and one frame of image is extracted from the video stream every 10 frames, then 6 frames can be extracted from the video stream. video frames, the first video frame is the 1st frame image in the video stream, the second video frame is the 11th frame image in the video stream, the third video frame is the 21st frame image in the video stream, the th The four video frames are the 31st frame image in the video stream, the fifth video frame is the 41st frame image in the video stream, and the sixth video frame is the 51st frame image in the video stream. In addition, the first video frame can be selected as the first image to be recognized, and at this time, the number of frames of the first image to be recognized is one frame; alternatively, the first video frame, the second video frame and the third video frame can be selected. The video frame is used as the first image to be recognized, and at this time, the number of frames of the first image to be recognized is 3 frames.
步骤102,基于目标检测模型,检测每一帧所述第一待识别图像中的人脸所在的第一人脸区域。 Step 102 , based on the target detection model, detect the first face region where the face in the first image to be recognized is located in each frame.
在本申请实施例中,在第一终端内存储有预先训练好的目标检测模型,该目标检测模型为神经网络模型。该目标检测模型是通过多个样本图像,以及每个样本图像中人为标注的人脸区域进行训练得到的。In the embodiment of the present application, a pre-trained target detection model is stored in the first terminal, and the target detection model is a neural network model. The target detection model is obtained by training multiple sample images and human-annotated face regions in each sample image.
第一终端在从视频流中提取到至少一帧第一待识别图像之后,将每一帧第一待识别图像输入目标检测模型,得到每一帧第一待识别图像中的人脸所在的第一人脸区域。After extracting at least one frame of the first to-be-recognized image from the video stream, the first terminal inputs each frame of the first to-be-recognized image into the target detection model, and obtains the first location where the face in each frame of the first to-be-recognized image is located. A person's face area.
实际上,第一人脸区域为矩形框区域,且检测到的第一人脸区域是以第一人脸区域的位置信息来表示的,如以矩形框的左上角坐标、矩形框的宽度和矩形框的高度来表示第一人脸区域。In fact, the first face area is a rectangular frame area, and the detected first face area is represented by the position information of the first face area, such as the coordinates of the upper left corner of the rectangular frame, the width of the rectangular frame and the The height of the rectangular box represents the first face area.
步骤103,从各帧所述第一待识别图像中选取目标图像,并将所述目标图像对应的第一人脸区域确定为目标人脸区域。Step 103: Select a target image from the first images to be identified in each frame, and determine the first face region corresponding to the target image as the target face region.
在本申请实施例中,第一终端在检测到每一帧第一待识别图像中的人脸所在的第一人脸区域之后,从各帧第一待识别图像中选取目标图像,并将目标图像对应的第一人脸区域确定为目标人脸区域。In the embodiment of the present application, after detecting the first face region where the face in each frame of the first image to be recognized is located, the first terminal selects a target image from the first image to be recognized in each frame, and assigns the target image to the target image. The first face region corresponding to the image is determined as the target face region.
当第一待识别图像的帧数为一帧时,该第一待识别图像也就是目标图像,且第一待识别图像的第一人脸区域也就是目标图像的目标人脸区域;而当第一待识别图像的帧数为至少两帧时,根据各个第一人脸区域之间的关系,从这至少两帧第一待识别图像中选取其中一帧第一待识别图像作为目标 图像,且目标图像对应的第一人脸区域也就是目标图像的目标人脸区域。When the number of frames of the first image to be recognized is one frame, the first image to be recognized is also the target image, and the first face area of the first image to be recognized is also the target face area of the target image; When the number of frames of an image to be recognized is at least two frames, according to the relationship between the first face regions, one frame of the first image to be recognized is selected from the at least two frames of the first image to be recognized as the target image, and The first face region corresponding to the target image is also the target face region of the target image.
例如,第一待识别图像的帧数为3帧,其分别为从视频流中提取的第一个视频帧、第二个视频帧和第三个视频帧,可选取第三个视频帧作为目标图像,且第三个视频帧的第一人脸区域也就是目标图像的目标人脸区域。For example, the number of frames of the first image to be recognized is 3 frames, which are the first video frame, the second video frame and the third video frame extracted from the video stream, and the third video frame can be selected as the target image, and the first face area of the third video frame is also the target face area of the target image.
步骤104,根据所述目标人脸区域,对所述目标图像进行裁剪,得到裁剪图像。 Step 104 , according to the target face region, crop the target image to obtain a cropped image.
在本申请实施例中,第一终端在确定目标图像以及目标图像对应的目标人脸区域之后,根据目标人脸区域对目标图像进行裁剪,得到裁剪图像,该裁剪图像包括目标人脸区域内的人脸。In the embodiment of the present application, after determining the target image and the target face area corresponding to the target image, the first terminal cuts the target image according to the target face area to obtain a cropped image, and the cropped image includes the target face area. human face.
步骤105,将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用户的身份信息。Step 105: Send the cropped image to a server, so that the server can identify the cropped image to determine the identity information of the target user.
在本申请实施例中,第一终端在得到裁剪图像之后,将裁剪图像发送至服务器,服务器接收第一终端发送的裁剪图像,然后,服务器提取裁剪图像中的第一人脸特征点,并将第一人脸特征点与服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识,接着服务器获取用户标识对应的目标用户的身份信息,从而实现目标用户的身份信息的确定。In the embodiment of the present application, after obtaining the cropped image, the first terminal sends the cropped image to the server, and the server receives the cropped image sent by the first terminal, and then the server extracts the first face feature point in the cropped image, and converts the cropped image to the server. The first facial feature point is compared with the facial feature points stored in the server to determine the user ID corresponding to the target user, and then the server obtains the identity information of the target user corresponding to the user ID, thereby realizing the determination of the target user's identity information.
其中,第一人脸特征点可以为人脸中的至少一个特征点,如鼻子、左眼、右眼、嘴巴等特征点;用户标识与服务器内存储的目标用户的人脸特征点和目标用户的身份信息相关联,其实际上是一个索引号码,通过索引号码将服务器内存储的人脸特征点和服务器内存储的身份信息关联起来;目标用户的身份信息包括目标用户的姓名、年龄、性别、身份证号码、手机号码、职业、学历等信息。Wherein, the first facial feature point may be at least one feature point in the human face, such as the feature points such as nose, left eye, right eye, mouth, etc.; the user identification is the same as the facial feature point of the target user stored in the server and the target user's facial feature point stored in the server. Identity information association, which is actually an index number, through which the facial feature points stored in the server are associated with the identity information stored in the server; the identity information of the target user includes the target user's name, age, gender, ID number, mobile phone number, occupation, education and other information.
当摄像头采集到包含目标用户的人脸信息的视频流时,就可基于第一终端和服务器的配合使用,直接确定出目标用户的身份信息,身份信息确定所需的时间较短,工作人员就可以针对用户的身份信息及时进行营销方案的定制,提高营销的效率,且提升用户的体验效果。When the camera collects the video stream containing the face information of the target user, it can directly determine the identity information of the target user based on the cooperative use of the first terminal and the server. The marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.
此外,通过根据目标人脸区域对目标图像进行裁剪,得到裁剪图像,并将裁剪图像发送至服务器,以识别目标用户的身份信息,而不是直接将目标图像发送至服务器以进行身份信息的识别,是为了减小服务器的计算量,降 低服务器的计算压力。并且,将目标图像的提取过程和裁剪图像的裁剪过程部署在第一终端,将第一人脸特征点的识别和对比过程以及身份信息的查询过程部署在服务器,避免将所有功能集成在一个设备中导致的带宽压力大,通过将部分功能部署在第一终端,而将部分功能部署在服务器,能够有效缓解视频流传输时的带宽压力,即减轻网络带宽压力。In addition, the cropped image is obtained by cropping the target image according to the target face area, and the cropped image is sent to the server to identify the identity information of the target user, instead of directly sending the target image to the server to identify the identity information, The purpose is to reduce the amount of computing on the server and reduce the computing pressure on the server. In addition, the extraction process of the target image and the cropping process of the cropped image are deployed on the first terminal, and the identification and comparison process of the first facial feature points and the query process of identity information are deployed on the server, so as to avoid integrating all functions in one device. The bandwidth pressure caused by video streaming is large. By deploying some functions on the first terminal and deploying some functions on the server, the bandwidth pressure during video streaming can be effectively relieved, that is, the network bandwidth pressure can be reduced.
在本申请实施例中,通过从视频流中提取至少一帧第一待识别图像,并从中选取目标图像,将目标图像对应的第一人脸区域确定为目标人脸区域,然后根据目标人脸区域对目标图像进行裁剪,将裁剪图像发送至服务器进行人脸识别,确定目标用户的身份信息,当采集的视频流中包含目标用户的人脸时,就可及时了解目标用户的身份信息,工作人员就可以针对用户的身份信息及时进行营销方案的定制,提高营销的效率,且提升用户的体验效果。In the embodiment of the present application, by extracting at least one frame of the first image to be recognized from the video stream, and selecting the target image from it, the first face area corresponding to the target image is determined as the target face area, and then according to the target face area The target image is cropped in the area, and the cropped image is sent to the server for face recognition to determine the identity information of the target user. When the collected video stream contains the face of the target user, the identity information of the target user can be known in time. The personnel can customize the marketing plan in time according to the user's identity information, improve the marketing efficiency, and improve the user's experience effect.
参照图2,示出了本申请实施例的一种身份识别方法的具体流程图,应用于第一终端,具体可以包括如下步骤:Referring to FIG. 2 , a specific flowchart of an identity recognition method according to an embodiment of the present application is shown, which is applied to the first terminal and may specifically include the following steps:
步骤201,每间隔预设帧数,从所述视频流中提取至少两帧第一待识别图像。 Step 201 , extract at least two frames of the first image to be identified from the video stream at every preset number of frames.
在本申请实施例中,摄像头实时采集视频流,并将采集的视频流发送至第一终端,第一终端可以每间隔预设帧数从视频流中提取一帧图像,得到多个视频帧,然后,从多个视频帧中选取至少两个视频帧作为第一待识别图像,即实现从视频流中提取至少两帧第一待识别图像。In the embodiment of the present application, the camera collects the video stream in real time, and sends the collected video stream to the first terminal, and the first terminal can extract one frame of image from the video stream every preset number of frames to obtain multiple video frames, Then, at least two video frames are selected from the plurality of video frames as the first to-be-identified images, that is, at least two frames of the first to-be-identified images are extracted from the video stream.
具体的,在这至少两帧第一待识别图像中,任意相邻的两帧第一待识别图像可以是连续的视频帧,且任意相邻的两帧第一待识别图像在视频流中实际上间隔预设帧数。Specifically, in the at least two frames of the first to-be-recognized images, any two adjacent frames of the first to-be-recognized images may be consecutive video frames, and any two adjacent frames of the first to-be-recognized images are actually in the video stream. The upper interval preset number of frames.
例如,从视频流中提取3帧第一待识别图像,其分别为视频流中的第一个视频帧、第二个视频帧和第三个视频帧,第一个视频帧为视频流中的第1帧图像,第二个视频帧为视频流中的第11帧图像,第三个视频帧为视频流中的第21帧图像,在这3帧第一待识别图像中,第一个视频帧、第二个视频帧和第三个视频帧是连续的,其中间不存在其他提取到的视频帧,且相邻的两帧第一待识别图像在视频流中实际上间隔的预设帧数为10帧。For example, extract 3 frames of the first image to be identified from the video stream, which are the first video frame, the second video frame, and the third video frame in the video stream, and the first video frame is the first video frame in the video stream. The first frame image, the second video frame is the 11th frame image in the video stream, and the third video frame is the 21st frame image in the video stream. Among these 3 frames of the first image to be recognized, the first video frame frame, the second video frame, and the third video frame are consecutive, there are no other extracted video frames in between, and the two adjacent frames of the first to-be-recognized image are actually spaced apart in the video stream by the preset frame The number is 10 frames.
步骤202,对每一帧所述第一待识别图像进行压缩处理,使得压缩后的 所述第一待识别图像的尺寸小于压缩前的所述第一待识别图像的尺寸。Step 202: Perform compression processing on each frame of the first image to be identified, so that the size of the first image to be identified after compression is smaller than the size of the first image to be identified before compression.
在本申请实施例中,第一终端在从视频流中提取到至少两帧第一待识别图像之后,对每一帧第一待识别图像进行压缩处理,使得压缩后的第一待识别图像的尺寸小于压缩前的第一待识别图像的尺寸,具体的,可将第一待识别图像进行等比例缩小。In this embodiment of the present application, after extracting at least two frames of the first to-be-recognized image from the video stream, the first terminal performs compression processing on each frame of the first to-be-recognized image, so that the compressed first to-be-recognized image is The size is smaller than the size of the first to-be-recognized image before compression. Specifically, the first to-be-recognized image may be reduced proportionally.
例如,压缩前的第一待识别图像的尺寸中的宽度为W,高度为H,则压缩后的第一待识别图像的尺寸中的宽度为W/3,高度也为H/3。For example, the width of the first image to be recognized before compression is W and the height is H, then the width of the size of the first image to be recognized after compression is W/3, and the height is also H/3.
步骤203,将压缩后的每一帧所述第一待识别图像输入SSD模型中,得到每一帧所述第一待识别图像中的人脸所在的第一人脸区域。Step 203: Input the compressed first image to be recognized in each frame into the SSD model to obtain a first face region where the face in the first image to be recognized is located in each frame.
在本申请实施例中,在第一终端内存储有预先训练好的目标检测模型,该目标检测模型为神经网络模型,而该神经网络模型可选取SSD(Single Shot MultiBox Detector,单击多盒检测)模型。In the embodiment of the present application, a pre-trained target detection model is stored in the first terminal, the target detection model is a neural network model, and the neural network model can be selected from SSD (Single Shot MultiBox Detector, click multi-box detection )Model.
第一终端在将每一帧第一待识别图像进行压缩处理后,可将压缩后的每一帧第一待识别图像输入SSD模型中,则SSD模型会输出每一帧第一待识别图像中的人脸所在的第一人脸区域对应的位置信息,如输出第一人脸区域对应的矩形框的左上角坐标、矩形框的宽度和矩形框的高度等信息,此外,还会输出第一人脸区域对应的类别。After compressing the first image to be recognized in each frame, the first terminal can input the compressed first image to be recognized in each frame into the SSD model, and the SSD model will output the first image to be recognized in each frame. The location information corresponding to the first face area where the face of the The category corresponding to the face area.
通过对第一待识别图像进行压缩处理,并将压缩后的第一待识别图像发输入SSD模型进行检测,可缩短检测的速度。By compressing the first to-be-recognized image and sending the compressed first to-be-recognized image to the SSD model for detection, the detection speed can be shortened.
其中,SSD模型包括依次连接的主干网络、多尺度检测子网络和NMS(NonMaximumSuppression,非极大值抑制)网络,主干网络可以是深度卷积神经网络,如VGG16网络,多尺度检测子网络包括多层依次连接的卷积层以及与每层卷积层均连接的分类网络层和位置回归网络层。Among them, the SSD model includes a backbone network, a multi-scale detection sub-network and an NMS (NonMaximum Suppression, non-maximum suppression) network that are connected in sequence. The backbone network can be a deep convolutional neural network, such as a VGG16 network. Layers are convolutional layers connected in sequence and a classification network layer and a position regression network layer connected to each convolutional layer.
将压缩后的第一待识别图像输入SSD模型中,先通过主干网络对第一待识别图像进行特征提取,得到特征图像,然后,将特征图像输入多层卷积层中,得到不同尺度、不同宽高比的预选框对应的类别概率值(score)及其位置偏移量(location);然后,将所有的预选框的类别概率值、位置偏移量和预选框,输入分类网络层和位置回归网络层中进行分类和回归处理;最后,将分类和回归处理后的各个预选框输入NMS网路中,通过NMS网络将多 余的预选框进行剔除,选出置信度最高的预选框作为第一人脸区域对应的矩形框。Input the compressed first image to be recognized into the SSD model, first perform feature extraction on the first image to be recognized through the backbone network to obtain a feature image, and then input the feature image into a multi-layer convolution layer to obtain different scales and different The category probability value (score) and its position offset (location) corresponding to the preselected box of the aspect ratio; then, enter the category probability value, position offset and preselected box of all preselected boxes into the classification network layer and location The classification and regression processing are carried out in the regression network layer; finally, each pre-selected box after classification and regression processing is input into the NMS network, the redundant pre-selected boxes are eliminated through the NMS network, and the pre-selected box with the highest confidence is selected as the first one. The rectangular box corresponding to the face area.
步骤204,针对所述至少两帧第一待识别图像,分别计算每一帧所述第一待识别图像对应的第一人脸区域与相邻的前一帧所述第一待识别图像对应的第一人脸区域的交并比。 Step 204, for the at least two frames of the first to-be-recognized images, respectively calculate the relationship between the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame of the first to-be-recognized image. The intersection ratio of the first face region.
在本申请实施例中,针对至少两帧第一待识别图像,分别计算每一帧第一待识别图像对应的第一人脸区域与相邻的前一帧第一待识别图像对应的第一人脸区域的交并比(IOU,Intersection Over Union)。其中,交并比指的是两个第一人脸区域中,重合区域的面积占两个第一人脸区域的总面积的比值。In the embodiment of the present application, for at least two frames of the first to-be-recognized image, the first face region corresponding to the first to-be-recognized image of each frame and the first to-be-recognized image corresponding to the adjacent previous frame are calculated respectively. Intersection Over Union (IOU) of the face area. The intersection ratio refers to the ratio of the area of the overlapping area to the total area of the two first face areas in the two first face areas.
例如,从视频流中提取3帧第一待识别图像,其分别为视频流中的第1帧图像、视频流中的第11帧图像和视频流中的第21帧图像,且这3帧第一待识别图像对应的第一人脸区域分别为第一人脸区域1、第一人脸区域2和第一人脸区域3,则分别计算第一人脸区域1与第一人脸区域2的交并比,以及第一人脸区域2与第一人脸区域3的交并比。For example, extract 3 frames of the first image to be identified from the video stream, which are the 1st frame image in the video stream, the 11th frame image in the video stream, and the 21st frame image in the video stream, and the 3 frames of the first image When the first face area corresponding to the image to be recognized is the first face area 1, the first face area 2 and the first face area 3, respectively, the first face area 1 and the first face area 2 are calculated respectively. The intersection ratio of , and the intersection ratio of the first face area 2 and the first face area 3.
步骤205,当所述至少两帧第一待识别图像对应的交并比均大于或等于第一设定阈值时,从所述至少两帧第一待识别图像中选取任意一帧所述第一待识别图像作为目标图像,并将所述目标图像对应的第一人脸区域确定为所述目标人脸区域。 Step 205, when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to a first set threshold, select any frame of the first to-be-recognized image from the at least two frames of the first to-be-recognized images. The image to be recognized is used as the target image, and the first face area corresponding to the target image is determined as the target face area.
在本申请实施例中,当该至少两帧第一待识别图像对应的交并比均大于或等于第一设定阈值时,则判断第一终端没有误识别第一待识别图像中的人脸所在的第一人脸区域,即检测到的人脸所在的第一人脸区域准确,此时,从至少两帧第一待识别图像中选取任意一帧第一待识别图像作为目标图像,并将目标图像对应的第一人脸区域确定为目标人脸区域。通常,会从至少两帧第一待识别图像中选取最靠后的一帧第一待识别图像作为目标图像;第一设定阈值可人为设定,如第一设定阈值为0.5。In the embodiment of the present application, when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to the first set threshold, it is determined that the first terminal has not mistakenly recognized the face in the first to-be-recognized image The first face region where the detected face is located, that is, the first face region where the detected face is located is accurate. At this time, any frame of the first to-be-recognized image is selected from at least two frames of the first to-be-recognized image as the target image, and The first face region corresponding to the target image is determined as the target face region. Usually, the last frame of the first to-be-recognized image is selected from at least two frames of the first to-be-recognized image as the target image; the first set threshold can be set manually, for example, the first set threshold is 0.5.
例如,计算得到第一人脸区域1与第一人脸区域2的交并比为0.6,第一人脸区域2与第一人脸区域3的交并比为0.8,其均大于第一设定阈值0.5,且视频流中提取的3帧第一待识别图像分别为:视频流中的第1帧图像、视 频流中的第11帧图像和视频流中的第21帧图像,则从中选取视频流中的第21帧图像作为目标图像,并将视频流中的第21帧图像对应的第一人脸区域3作为目标人脸区域。For example, it is calculated that the intersection ratio between the first face area 1 and the first face area 2 is 0.6, and the intersection ratio between the first face area 2 and the first face area 3 is 0.8, which are all larger than the first set The threshold value is 0.5, and the first three frames of the first image to be recognized extracted from the video stream are respectively: the first frame image in the video stream, the 11th frame image in the video stream and the 21st frame image in the video stream, then select from The 21st frame image in the video stream is used as the target image, and the first face area 3 corresponding to the 21st frame image in the video stream is used as the target face area.
需要说明的是,当该至少两帧第一待识别图像对应的交并比中,存在小于第一设定阈值的交并比时,确定检测到的人脸所在的第一人脸区域不准确,即第一终端可能将其他对象误认为是人脸,因此,不会继续对错误的第一人脸区域继续进行处理,则重新执行从视频流中提取第一待识别图像以及之后的步骤。It should be noted that, when there is an intersection ratio smaller than the first set threshold in the intersection ratios corresponding to the at least two frames of the first to-be-recognized images, it is determined that the first face region where the detected face is located is inaccurate. , that is, the first terminal may mistake other objects as faces, therefore, it will not continue to process the wrong first face region, and re-execute the steps of extracting the first to-be-recognized image from the video stream and the subsequent steps.
步骤206,检测所述目标人脸区域内的第一人脸关键点对应的坐标位置。Step 206: Detect the coordinate position corresponding to the first face key point in the target face area.
在本申请实施例中,第一终端在确定目标图像以及目标图像对应的目标人脸区域之后,通过人脸检测算法检测目标人脸区域内的第一人脸关键点,并确定第一人脸关键点对应的坐标位置。In the embodiment of the present application, after determining the target image and the target face area corresponding to the target image, the first terminal detects the first face key points in the target face area through a face detection algorithm, and determines the first face The coordinate position corresponding to the key point.
其中,第一人脸关键点包括左眼、右眼、鼻子、左嘴角、右嘴角等关键点,则第一人脸关键点对应的坐标位置包括左眼在目标图像中的坐标位置、右眼在目标图像中的坐标位置、鼻子在目标图像中的坐标位置、左嘴角在目标图像中的坐标位置和右嘴角在目标图像中的坐标位置等。Among them, the first face key points include key points such as left eye, right eye, nose, left mouth corner, right mouth corner, etc., and the coordinate position corresponding to the first face key point includes the coordinate position of the left eye in the target image, the right eye The coordinate position in the target image, the coordinate position of the nose in the target image, the coordinate position of the left mouth corner in the target image, and the coordinate position of the right mouth corner in the target image, etc.
通过预先确定目标图像中的目标人脸区域,然后,仅检测目标人脸区域内的第一人脸关键点,可减少第一终端的计算量。By pre-determining the target face region in the target image, and then detecting only the first face key point in the target face region, the calculation amount of the first terminal can be reduced.
步骤207,根据所述第一人脸关键点对应的坐标位置,从所述目标图像中截取包括所述第一人脸关键点的区域,得到所述裁剪图像。 Step 207 , according to the coordinate position corresponding to the first face key point, intercept an area including the first face key point from the target image to obtain the cropped image.
在本申请实施例中,第一终端在检测到目标人脸区域内的第一人脸关键点对应的坐标位置之后,根据第一人脸关键点对应的坐标位置,从目标图像中截取包括第一人脸关键点的区域,得到裁剪图像。In the embodiment of the present application, after detecting the coordinate position corresponding to the first face key point in the target face area, the first terminal intercepts the target image including the first face key point according to the coordinate position corresponding to the first face key point. The area of key points of a face is obtained as a cropped image.
具体的,是按照预设裁剪尺寸,对目标图像进行裁剪,使得裁剪图像的尺寸为该预设裁剪尺寸,且该裁剪图像中包含所有的第一人脸关键点。Specifically, the target image is cropped according to the preset cropping size, so that the size of the cropped image is the preset cropping size, and the cropped image includes all the key points of the first face.
步骤208,将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用户的身份信息。Step 208: Send the cropped image to a server, so that the server can identify the cropped image to determine the identity information of the target user.
此步骤与上述实施例一的步骤105原理类似,在此不再赘述。The principle of this step is similar to that of step 105 in the above-mentioned first embodiment, and details are not repeated here.
进一步的,在步骤208之后还包括以下步骤:接收所述服务器发送的目 标用户的用户标识,并存储所述用户标识和所述用户标识对应的所述目标人脸区域;从所述视频流中依次提取N帧第二待识别图像;所述第二待识别图像是所述视频流中位于所述第一待识别图像之后的视频帧,所述N为大于1的正整数;基于所述目标检测模型,检测每一帧所述第二待识别图像中的人脸所在的第二人脸区域;计算所述目标人脸区域与每一帧所述第二待识别图像对应的第二人脸区域的交并比;当连续提取的N帧所述第二待识别图像对应的第二人脸区域中,存在与所述目标人脸区域的交并比大于或等于第二设定阈值的第二人脸区域时,执行所述从所述视频流中依次提取N帧第二待识别图像及之后的步骤;当连续提取的N帧所述第二待识别图像对应的第二人脸区域,与所述目标人脸区域的交并比均小于所述第二设定阈值时,删除所述用户标识和所述目标人脸区域,并重新执行所述从视频流中提取至少一帧第一待识别图像及之后的步骤。Further, after step 208, it also includes the following steps: receiving the user identification of the target user sent by the server, and storing the user identification and the target face area corresponding to the user identification; Extracting N frames of the second image to be identified in sequence; the second image to be identified is a video frame located after the first image to be identified in the video stream, and N is a positive integer greater than 1; based on the target A detection model, which detects the second face area where the face in the second image to be recognized is located in each frame; calculates the target face area and the second face corresponding to the second image to be recognized in each frame The intersection ratio of the area; when the second face area corresponding to the second to-be-recognized image of N frames continuously extracted, there is a second face area whose intersection ratio with the target face area is greater than or equal to the second set threshold. When there are two face regions, the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and subsequent steps are performed; when the consecutively extracted N frames of the second to-be-recognized image correspond to the second face region, When the intersection ratio with the target face area is less than the second set threshold, delete the user ID and the target face area, and re-execute the first step of extracting at least one frame from the video stream. Image to be recognized and subsequent steps.
在本申请实施例中,第一终端在将裁剪图像发送至服务器后,服务器会提取裁剪图像中的第一人脸特征点,并将第一人脸特征点与服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识,然后,服务器会将目标用户的用户标识发送至第一终端;第一终端接收服务器发送的目标用户的用户标识,并将用户标识和用户标识对应的目标人脸区域缓存下来。In the embodiment of the present application, after the first terminal sends the cropped image to the server, the server will extract the first facial feature point in the cropped image, and compare the first facial feature point with the facial feature point stored in the server Make a comparison to determine the user ID corresponding to the target user, and then, the server will send the user ID of the target user to the first terminal; the first terminal receives the user ID of the target user sent by the server, and sends the user ID to the target user The face area is cached.
由于第一终端会实时获取摄像头采集的视频流,并将视频流按照需求拆分成多个视频帧,针对视频流中位于第一待识别图像之后的视频帧,将其称为第二待识别图像,可实现从视频流中依次提取N帧第二待识别图像。Since the first terminal will acquire the video stream captured by the camera in real time, and split the video stream into multiple video frames as required, the video frame located after the first image to be recognized in the video stream is called the second to-be-recognized image. image, which can be implemented to sequentially extract N frames of the second image to be recognized from the video stream.
例如,第一待识别图像包括第一个视频帧、第二个视频帧和第三个视频帧,且第一个视频帧为视频流中的第1帧图像,第二个视频帧为视频流中的第11帧图像,第三个视频帧为视频流中的第21帧图像,因此,将第三个视频帧之后的视频帧称为第二待识别图像,如第四个视频帧、第五个视频帧和第六个视频帧均为第二待识别图像,且第四个视频帧为视频流中的第31帧图像,第五个视频帧为视频流中的第41帧图像,第六个视频帧为视频流中的第51帧图像。For example, the first image to be recognized includes a first video frame, a second video frame, and a third video frame, and the first video frame is the first frame image in the video stream, and the second video frame is the video stream The 11th frame image in the video stream, the third video frame is the 21st frame image in the video stream, therefore, the video frame after the third video frame is called the second image to be recognized, such as the fourth video frame, the first video frame The five video frames and the sixth video frame are the second images to be identified, the fourth video frame is the 31st frame image in the video stream, the fifth video frame is the 41st frame image in the video stream, and the fourth video frame is the 31st frame image in the video stream. Six video frames are the 51st frame image in the video stream.
第一终端在提取到第二待识别图像之后,将每一帧第二待识别图像输入目标检测模型,得到每一帧第二待识别图像中的人脸所在的第二人脸区域。 目标检测模型也可以为SSD模型,第二待识别图像中的第二人脸区域的检测过程与第一待识别图像中的第一人脸区域的检测过程类似,在此不再赘述。After extracting the second to-be-recognized image, the first terminal inputs each frame of the second to-be-recognized image into the target detection model to obtain a second face region where the face in each frame of the second to-be-recognized image is located. The target detection model may also be an SSD model, and the detection process of the second face region in the second to-be-recognized image is similar to the detection process of the first human-face region in the first to-be-recognized image, which will not be repeated here.
然后,第一终端计算缓存的目标人脸区域与每一帧第二待识别图像对应的第二人脸区域的交并比。Then, the first terminal calculates the intersection ratio between the cached target face region and the second face region corresponding to the second to-be-recognized image of each frame.
当连续提取的N帧第二待识别图像对应的第二人脸区域中,存在与目标人脸区域的交并比大于或等于第二设定阈值的第二人脸区域时,确定目标人脸区域对应的目标用户没有发生漏跟踪,此时,继续执行从视频流中依次提取N帧第二待识别图像以及之后的步骤,也就是,继续从视频流中提取第二待识别图像,并检测第二待识别图像中的人脸所在的第二人脸区域,计算目标人脸区域与第二人脸区域的交并比,并再次与第二设定阈值进行判断。When there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the N frames of the second to-be-recognized images continuously extracted, the target face region is determined. The target user corresponding to the area has not missed tracking. At this time, continue to extract N frames of the second to-be-recognized image from the video stream and subsequent steps, that is, continue to extract the second to-be-recognized image from the video stream, and detect In the second face region where the face in the second to-be-recognized image is located, the intersection ratio between the target face region and the second face region is calculated, and the judgment is made again with the second set threshold.
而当连续提取的N帧第二待识别图像对应的第二人脸区域,与目标人脸区域的交并比均小于第二设定阈值时,确定目标人脸区域对应的目标用户发生漏跟踪,此时,删除缓存的用户标识和目标人脸区域,并重新执行步骤S201及之后的步骤,即重新从视频流中提取第一待识别图像,检测第一待识别图像中的人脸所在的第一人脸区域,并从中选取目标图像和目标人脸区域,然后对目标图像进行裁剪,将裁剪图像发送至服务器以进行目标用户的身份信息的识别。When the intersection ratio of the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted and the target face region is smaller than the second set threshold, it is determined that the target user corresponding to the target face region has missed tracking , at this time, delete the cached user ID and the target face area, and re-execute step S201 and subsequent steps, that is, re-extract the first image to be recognized from the video stream, and detect where the face in the first image to be recognized is located. The first face area is selected from the target image and the target face area, and then the target image is cropped, and the cropped image is sent to the server to identify the identity information of the target user.
其中,第二设定阈值和第一设定阈值可以相等,也可以不等;连续提取的第二待识别图像的帧数N,N的具体数值可根据实际情况人为设定,如设定N为20帧。Among them, the second set threshold and the first set threshold may be equal or unequal; the number of frames N of the second to-be-recognized images continuously extracted, and the specific value of N can be manually set according to the actual situation, such as setting N for 20 frames.
在本申请实施例中,服务器在确定目标用户的用户标识之后,会将用户标识发送至第一终端,第一终端存储用户标识和目标人脸区域,基于目标人脸区域对后续从视频流中提取的第二待识别图像进行判断,以实现目标用户的跟踪,当目标用户没有发生漏跟踪时,第一终端不会继续对第二待识别图像进行裁剪,也不会将裁剪图像发送至服务器以再次进行目标用户的身份信息的识别,避免终端将所有的裁剪图像都发送至服务器,导致服务器的计算压力大幅度增大,因此,本申请实施例可以在目标用户的跟踪时间内,降低终端和服务器的计算压力,相应提高各个目标用户的身份信息的识别速度; 而当目标用户发生漏跟踪时,通过第一终端和服务器重新检测目标用户的身份信息。In this embodiment of the present application, after determining the user identifier of the target user, the server sends the user identifier to the first terminal, and the first terminal stores the user identifier and the target face area, and based on the target face area, analyzes subsequent slave video streams from the video stream. The extracted second to-be-recognized image is judged to realize the tracking of the target user. When the target user does not miss tracking, the first terminal will not continue to crop the second to-be-recognized image, nor will the cropped image be sent to the server. In order to identify the identity information of the target user again, it is avoided that the terminal sends all the cropped images to the server, resulting in a substantial increase in the computing pressure of the server. The speed of identifying the identity information of each target user is correspondingly improved; and when the target user is missing tracking, the identity information of the target user is re-detected through the first terminal and the server.
在本申请实施例中,通过从视频流中提取至少两帧第一待识别图像,并在每一帧第一待识别图像对应的第一人脸区域与相邻的前一帧第一待识别图像对应的第一人脸区域的交并比,均大于或等于第一设定阈值的情况下,从至少两帧第一待识别图像中选取目标图像,将目标图像对应的第一人脸区域确定为目标人脸区域,然后识别目标人脸区域内的第一人脸关键点对应的坐标位置,根据第一人脸关键点对应的坐标位置对目标图像进行裁剪,将裁剪图像发送至服务器进行人脸识别,确定目标用户的身份信息,当采集的视频流中包含目标用户的人脸时,就可及时了解目标用户的身份信息,工作人员就可以针对用户的身份信息及时进行营销方案的定制,提高营销的效率,且提升用户的体验效果;并且只有在至少两帧第一待识别图像对应的交并比均大于或等于第一设定阈值时,才从中选取目标图像,避免目标检测模型误将其他对象所在的区域确定为第一人脸区域,导致后续无法检测出目标用户的身份信息的问题。In the embodiment of the present application, at least two frames of the first to-be-recognized image are extracted from the video stream, and the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame of the first to-be-recognized image are extracted. When the intersection ratio of the first face region corresponding to the image is greater than or equal to the first set threshold, select the target image from at least two frames of the first to-be-recognized image, and assign the first face region corresponding to the target image. Determine the target face area, then identify the coordinate position corresponding to the first face key point in the target face area, crop the target image according to the coordinate position corresponding to the first face key point, and send the cropped image to the server for processing. Face recognition, determine the identity information of the target user, when the collected video stream contains the face of the target user, the identity information of the target user can be known in time, and the staff can customize the marketing plan in time according to the identity information of the user , improve the efficiency of marketing, and improve the user experience effect; and only when the intersection ratio corresponding to at least two frames of the first to-be-recognized image is greater than or equal to the first set threshold, the target image is selected from it, avoiding the target detection model. The area where other objects are located is mistakenly determined as the first face area, resulting in the problem that the identity information of the target user cannot be detected in the future.
参照图3,示出了本申请实施例的另一种身份识别方法的流程图,应用于服务器,具体可以包括如下步骤:Referring to FIG. 3 , a flowchart of another identity recognition method according to an embodiment of the present application is shown, which is applied to a server and may specifically include the following steps:
步骤301,接收第一终端发送的裁剪图像;所述裁剪图像是所述第一终端根据目标人脸区域对目标图像进行裁剪后得到的。Step 301: Receive a cropped image sent by a first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region.
在本申请实施例中,第一终端实时获取摄像头采集的视频流,并从视频流中提取至少一帧第一待识别图像,再基于目标检测模型检测每一帧第一待识别图像中的人脸所在的第一人脸区域,从各帧第一待识别图像中选取目标图像,并将目标图像对应的第一人脸区域确定为目标人脸区域,接着根据目标人脸区域对目标图像进行裁剪,得到裁剪图像,最后,将裁剪图像发送至服务器。In the embodiment of the present application, the first terminal acquires the video stream collected by the camera in real time, extracts at least one frame of the first image to be recognized from the video stream, and then detects the person in each frame of the first image to be recognized based on the target detection model In the first face area where the face is located, the target image is selected from the first images to be recognized in each frame, and the first face area corresponding to the target image is determined as the target face area, and then the target image is processed according to the target face area. Crop, get the cropped image, and finally, send the cropped image to the server.
相应的,服务器接收第一终端发送的裁剪图像,该裁剪图像是第一终端根据目标人脸区域对目标图像进行裁剪后得到的,具体的,该裁剪图像包括目标图像中的目标人脸区域内的第一人脸关键点。Correspondingly, the server receives the cropped image sent by the first terminal, and the cropped image is obtained by the first terminal after cropping the target image according to the target face area. Specifically, the cropped image includes the target face area in the target image. The key point of the first face.
步骤302,提取所述裁剪图像中的第一人脸特征点。 Step 302, extracting the first face feature point in the cropped image.
在本申请实施例中,服务器在接收到第一终端发送的裁剪图像之后,提取裁剪图像中的第一人脸特征点,第一人脸特征点可以为人脸中的至少一个特征点,如鼻子、左眼、右眼、嘴巴等特征点。In this embodiment of the present application, after receiving the cropped image sent by the first terminal, the server extracts a first facial feature point in the cropped image, and the first facial feature point may be at least one feature point in a human face, such as a nose , left eye, right eye, mouth and other feature points.
步骤303,将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识。Step 303: Compare the first face feature point with the face feature point stored in the server to determine the user identifier corresponding to the target user.
在本申请实施例中,服务器中设置有人脸特征库和用户身份信息数据库,人脸特征库中存储有各个用户的人脸特征点,用户身份信息数据库存储有各个用户的身份信息,且服务器中存储有各个用户的用户标识,该用户标识与人脸特征库中存储的用户的人脸特征点,以及用户身份信息数据库中存储的用户的身份信息一一对应。In the embodiment of the present application, a face feature database and a user identity information database are set in the server, the face feature database stores the face feature points of each user, the user identity information database stores the identity information of each user, and the server stores the face feature points of each user. User IDs of each user are stored, and the user IDs are in one-to-one correspondence with the user's facial feature points stored in the facial feature database and the user's identity information stored in the user identity information database.
服务器在提取到裁剪图像中的第一人脸特征点之后,将第一人脸特征点与人脸特征库内存储的各个人脸特征点进行对比,当第一人脸特征点与人脸特征库内存储的目标人脸特征点的相似度大于相似度阈值时,则确定第一人脸特征点与目标人脸特征点匹配上,然后,查询目标人脸特征点对应的用户标识,该用户标识也就是裁剪图像所对应的目标用户的用户标识。After extracting the first face feature point in the cropped image, the server compares the first face feature point with each face feature point stored in the face feature database. When the similarity of the target face feature points stored in the library is greater than the similarity threshold, it is determined that the first face feature point matches the target face feature point, and then the user ID corresponding to the target face feature point is queried. The ID is the user ID of the target user corresponding to the cropped image.
步骤304,获取所述用户标识对应的目标用户的身份信息。Step 304: Obtain the identity information of the target user corresponding to the user identifier.
在本申请实施例中,服务器在获取到目标用户对应的用户标识之后,由于用户标识与人脸特征库中存储的用户的人脸特征点,以及用户身份信息数据库中存储的用户的身份信息一一对应,因此,可根据目标用户的用户标识,从用户身份信息数据库中查询到目标用户的身份信息。In the embodiment of the present application, after the server obtains the user identifier corresponding to the target user, because the user identifier is the same as the user's face feature points stored in the face feature database and the user's identity information stored in the user identity information database One-to-one correspondence, therefore, the identity information of the target user can be queried from the user identity information database according to the user identity of the target user.
其中,目标用户的身份信息包括目标用户的姓名、年龄、性别、身份证号码、手机号码、职业、学历等信息。The identity information of the target user includes the target user's name, age, gender, ID card number, mobile phone number, occupation, education background and other information.
并且,服务器可基于每个目标用户的身份信息的识别时间和识别次数,统计目标用户的到访时间和到访次数。In addition, the server may count the visiting time and the number of visits of the target user based on the identification time and identification times of the identity information of each target user.
进一步的,在步骤304之后,还包括:将所述目标用户的身份信息发送至第二终端,以通过所述第二终端进行所述目标用户的到访提醒。Further, after step 304, the method further includes: sending the identity information of the target user to a second terminal, so as to remind the target user of visiting through the second terminal.
在本申请实施例中,服务器在识别目标用户的身份信息之后,将目标用户的身份信息发送至第二终端,第二终端接收服务器发送的目标用户的身份信息,通过在第二终端的显示屏显示该身份信息,来提醒相关工作人员目标 用户的到访,工作人员可通过第二终端查看目标用户的身份信息,及时针对目标用户制定营销方案,提高营销的效率。In this embodiment of the present application, after identifying the identity information of the target user, the server sends the identity information of the target user to the second terminal, and the second terminal receives the identity information of the target user sent by the server, and passes the information on the display screen of the second terminal. The identity information is displayed to remind the relevant staff of the target user's visit. The staff can view the identity information of the target user through the second terminal, and formulate a marketing plan for the target user in time to improve the efficiency of marketing.
其中,第二终端可以为部署在相应场合中的显示屏,如部署在银行营业厅内的显示屏,第二终端也可以是相关工作人员指定的终端设备,如产品经理所持有的手机、电脑等终端设备。The second terminal may be a display screen deployed in a corresponding occasion, such as a display screen deployed in a bank business hall, and the second terminal may also be a terminal device designated by relevant staff, such as a mobile phone held by a product manager, Terminal equipment such as computers.
进一步的,在步骤303之后,还包括:将所述目标用户的用户标识发送至所述第一终端,以通过所述第一终端计算连续提取的N帧第二待识别图像对应的第二人脸区域与所述目标人脸区域的交并比,从而实现所述目标用户的跟踪;所述N为大于1的正整数。Further, after step 303, it also includes: sending the user identification of the target user to the first terminal, so as to calculate the second person corresponding to the N frames of the second to-be-identified images continuously extracted through the first terminal The intersection ratio between the face area and the target face area, so as to realize the tracking of the target user; the N is a positive integer greater than 1.
在本申请实施例中,服务器在确定目标用户对应的用户标识之后,将目标用户的用户标识发送至第一终端,第一终端接收服务器发送的目标用户的用户标识,并将用户标识和用户标识对应的目标人脸区域缓存下来。In the embodiment of the present application, after determining the user ID corresponding to the target user, the server sends the user ID of the target user to the first terminal, and the first terminal receives the user ID of the target user sent by the server, and sends the user ID and the user ID to the first terminal. The corresponding target face area is cached.
接着,第一终端从视频流中依次提取N帧第二待识别图像,该第二待识别图像是视频流中位于第一待识别图像之后的视频帧,然后,基于目标检测模型,检测每一帧第二待识别图像中的人脸所在的第二人脸区域,并计算目标人脸区域与每一帧第二待识别图像对应的第二人脸区域的交并比。Next, the first terminal sequentially extracts N frames of the second to-be-recognized image from the video stream, where the second to-be-recognized image is a video frame located after the first to-be-recognized image in the video stream, and then, based on the target detection model, detects each The second face area where the face in the second frame of the image to be identified is located, and the intersection ratio between the target face area and the second face area corresponding to each frame of the second image to be identified is calculated.
根据N帧第二待识别图像对应的第二人脸区域与目标人脸区域的交并比,判断目标用户是否漏跟踪。具体的,当连续提取的N帧第二待识别图像对应的第二人脸区域中,存在与目标人脸区域的交并比大于或等于第二设定阈值的第二人脸区域时,确定目标人脸区域对应的目标用户没有发生漏跟踪;当连续提取的N帧第二待识别图像对应的第二人脸区域,与目标人脸区域的交并比均小于第二设定阈值时,确定目标人脸区域对应的目标用户发生漏跟踪。According to the intersection ratio of the second face region corresponding to the N frames of the second to-be-recognized image and the target face region, it is determined whether the target user has missed tracking. Specifically, when there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted, determine The target user corresponding to the target face area has no missed tracking; when the intersection ratio of the second face area corresponding to the N frames of the second to-be-recognized image continuously extracted and the target face area is smaller than the second set threshold, It is determined that the target user corresponding to the target face area has missed tracking.
在目标用户没有漏跟踪的情况下,继续执行从视频流中依次提取N帧第二待识别图像以及之后的步骤,而在目标用户发生漏跟踪的情况下,删除缓存的用户标识和目标人脸区域,并重新执行从视频流中提取第一待识别图像以及之后的步骤。In the case that the target user does not miss tracking, continue to perform the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and the subsequent steps, and in the case of missing tracking of the target user, delete the cached user ID and target face area, and re-execute the steps of extracting the first image to be recognized from the video stream and the subsequent steps.
通过第一终端对目标用户进行跟踪,可降低终端和服务器的计算压力,相应提高各个目标用户的身份信息的识别速度。Tracking the target user through the first terminal can reduce the calculation pressure of the terminal and the server, and correspondingly improve the identification speed of the identity information of each target user.
参照图4,示出了本申请实施例中注册用户的人脸特征点和身份信息的注册过程的流程图,具体可以包括如下步骤:Referring to FIG. 4, a flowchart of the registration process of the facial feature points and identity information of the registered user in the embodiment of the present application is shown, which may specifically include the following steps:
步骤401,接收第三终端发送的人脸图像和注册用户的身份信息。Step 401: Receive the face image and the identity information of the registered user sent by the third terminal.
本申请实施例中,注册用户可在第三终端上输入人脸图像和注册用户的身份信息,然后,第三终端将人脸图像和注册用户的身份信息发送至服务器,服务器接收第三终端发送的人脸图像和注册用户的身份信息。In the embodiment of this application, the registered user can input the face image and the identity information of the registered user on the third terminal, and then the third terminal sends the face image and the identity information of the registered user to the server, and the server receives the transmission from the third terminal. face images and the identity information of registered users.
第三终端发送的人脸图像可以是第三终端实时采集的,也可以是预先存储在第三终端上的;并且,第三终端可以是部署在相应场合中的终端设备,如银行营业厅内部署的可采集人脸图像的终端,第三终端也可以是注册用户持有的终端设备,如注册用户持有的手机等。The face image sent by the third terminal may be collected in real time by the third terminal, or may be pre-stored on the third terminal; and, the third terminal may be a terminal device deployed in a corresponding occasion, such as in a bank business hall The deployed terminal that can collect face images, and the third terminal may also be a terminal device held by a registered user, such as a mobile phone held by a registered user.
步骤402,基于目标检测模型,检测所述人脸图像中的人脸所在的第三人脸区域。 Step 402 , based on the target detection model, detect a third face region where the face in the face image is located.
本申请实施例中,在服务器内也存储有预先训练好的目标检测模型,该目标检测模型可以为SSD模型。In the embodiment of the present application, a pre-trained target detection model is also stored in the server, and the target detection model may be an SSD model.
服务器在接收到第三终端发送的人脸图像和注册用户的身份信息之后,将人脸图像输入至目标检测模型,则目标检测模型输出人脸图像中的人脸所在的第三人脸区域。After receiving the face image and the identity information of the registered user sent by the third terminal, the server inputs the face image into the target detection model, and the target detection model outputs the third face area where the face in the face image is located.
该第三人脸区域实际上也是一个矩形框区域,且检测到的第三人脸区域也是以第三人脸区域的位置信息来表示的。The third face area is actually a rectangular frame area, and the detected third face area is also represented by the position information of the third face area.
步骤403,提取所述第三人脸区域内的第二人脸特征点。Step 403: Extract the second face feature points in the third face region.
本申请实施例中,服务器在检测到人脸图像中的人脸所在的第三人脸区域之后,提取第三人脸区域内的第二人脸特征点,第二人脸特征点可以为人脸图像中的至少一个特征点,如鼻子、左眼、右眼、嘴巴等特征点。In this embodiment of the present application, after detecting the third face region where the face in the face image is located, the server extracts a second face feature point in the third face region, and the second face feature point may be a human face At least one feature point in the image, such as nose, left eye, right eye, mouth and other feature points.
步骤404,将所述第二人脸特征点与所述服务器内存储的人脸特征点进行对比,确定所述服务器内是否存储有所述第二人脸特征点。Step 404: Compare the second face feature point with the face feature point stored in the server to determine whether the second face feature point is stored in the server.
本申请实施例中,服务器在提取到第三人脸区域内的第二人脸特征点之后,将第二人脸特征点与服务器内存储的人脸特征点进行对比,即确定第二人脸特征点与服务器内存储的任一人脸特征点是否匹配,从而确定服务器内是否存储有第二人脸特征点,也就是判断该注册用户是否已经注册过。In the embodiment of the present application, after extracting the second face feature point in the third face area, the server compares the second face feature point with the face feature point stored in the server, that is, determines the second face feature point Whether the feature point matches any face feature point stored in the server, so as to determine whether there is a second face feature point stored in the server, that is, to determine whether the registered user has already registered.
步骤405,当所述服务器内存储有所述第二人脸特征点时,向所述第三终端返回注册错误信息。 Step 405, when the second face feature point is stored in the server, return registration error information to the third terminal.
本申请实施例中,当第二人脸特征点与服务器内存储的其中一个人脸特征点匹配时,确定服务器内存储有第二人脸特征点,相应的,也就确定该注册用户之前已经注册过,此时,服务器向第三终端返回注册错误信息,提醒注册用户之前已经注册过。In the embodiment of the present application, when the second facial feature point matches one of the facial feature points stored in the server, it is determined that the second facial feature point is stored in the server. Correspondingly, it is also determined that the registered user has previously After registration, at this time, the server returns a registration error message to the third terminal, reminding the registered user that the registration has been done before.
在实际使用过程中,可能存在长相相似的两个用户,当判断服务器内存储有第二人脸特征点时,也可以向工作人员对应的终端设备发送注册错误信息、注册用户的身份信息以及服务器中与第二人脸特征点匹配的人脸特征点对应的身份信息,以提醒工作人员进行及时处理,如人为对该注册用户进行注册。In the actual use process, there may be two users who look alike. When it is determined that the second facial feature point is stored in the server, the registration error information, the identity information of the registered user and the server can also be sent to the terminal device corresponding to the staff member. The identity information corresponding to the facial feature points that match the second facial feature points in the data center is used to remind the staff to deal with it in time, such as manually registering the registered user.
步骤406,当所述服务器内未存储有所述第二人脸特征点时,保存所述第二人脸特征点和所述注册用户的身份信息,并生成分别与所述第二人脸特征点和所述注册用户的身份信息相关联的用户标识。 Step 406, when the second face feature point is not stored in the server, save the second face feature point and the identity information of the registered user, and generate a point the user ID associated with the registered user's identity information.
本申请实施例中,当第二人脸特征点与服务器内存储的各个人脸特征点均不匹配时,确定服务器内未存储有第二人脸特征点,然后,服务器保存第二人脸特征点和注册用户的身份信息,具体的,是将第二人脸特征点存储在人脸特征库中,将注册用户的身份信息存储在用户身份信息数据库中,并生成分别与第二人脸特征点和注册用户的身份信息相关联的用户标识。In the embodiment of the present application, when the second facial feature point does not match each facial feature point stored in the server, it is determined that there is no second facial feature point stored in the server, and then the server saves the second facial feature point and the identity information of the registered user, specifically, storing the second facial feature points in the facial feature database, storing the identity information of the registered user in the user identity information database, and generating the second facial feature The user ID associated with the registered user's identity information.
需要说明的是,第一终端、第二终端和第三终端不是同一终端,第一终端实际是固定在摄像头后端的一个开发板,如rk3399开发板。It should be noted that the first terminal, the second terminal and the third terminal are not the same terminal, and the first terminal is actually a development board fixed on the back end of the camera, such as the rk3399 development board.
进一步的,在步骤402之后,还包括:基于人脸遮挡模型,确定所述第三人脸区域内的人脸是否被遮挡;当所述第三人脸区域内的人脸未被遮挡时,检测所述第三人脸区域内的第二人脸关键点对应的坐标位置;根据所述第二人脸关键点对应的坐标位置,确定所述第三人脸区域内的人脸是否是正脸;当所述第三人脸区域内的人脸是正脸时,执行所述提取所述第三人脸区域内的第二人脸特征点及之后的步骤。Further, after step 402, it also includes: based on the face occlusion model, determining whether the face in the third face area is occluded; when the face in the third face area is not occluded, Detecting the coordinate position corresponding to the second face key point in the third face area; determining whether the face in the third face area is a frontal face according to the coordinate position corresponding to the second face key point ; When the human face in the third human face area is a frontal face, perform the steps of extracting the second human face feature point in the third human face area and subsequent steps.
本申请实施例中,服务器预先存储有训练好的人脸遮挡模型,该人脸遮挡模型为神经网络模型。服务器在检测到人脸图像中的人脸所在的第三人脸 区域之后,将人脸图像输入至人脸遮挡模型,人脸遮挡模型输出相应结果,该结果表征人脸图像中的第三人脸区域内的人脸是否被遮挡。In the embodiment of the present application, the server prestores a trained face occlusion model, and the face occlusion model is a neural network model. After detecting the third face area where the face in the face image is located, the server inputs the face image to the face occlusion model, and the face occlusion model outputs a corresponding result, which represents the third person in the face image Whether the face in the face area is occluded.
当第三人脸区域内的人脸未被遮挡时,服务器检测第三人脸区域内的第二人脸关键点,确定第二人脸关键点对应的坐标位置,第二人脸关键点包括左眼、右眼、鼻子、左嘴角、右嘴角等关键点,则第二人脸关键点对应的坐标位置包括左眼在人脸图像中的坐标位置、右眼在人脸图像中的坐标位置、鼻子在人脸图像中的坐标位置、左嘴角在人脸图像中的坐标位置和右嘴角在人脸图像中的坐标位置等。When the face in the third face area is not blocked, the server detects the second face key point in the third face area, and determines the coordinate position corresponding to the second face key point, and the second face key point includes If the key points such as left eye, right eye, nose, left corner of mouth, right corner of mouth, etc., the coordinate position corresponding to the second face key point includes the coordinate position of the left eye in the face image and the coordinate position of the right eye in the face image. , the coordinate position of the nose in the face image, the coordinate position of the left mouth corner in the face image, and the coordinate position of the right mouth corner in the face image, etc.
然后,服务器根据第二人脸关键点对应的坐标位置,确定第三人脸区域内的人脸是否是正脸。具体的,是根据任意两个第二人脸关键点的坐标位置之间的距离,判断第三人脸区域内的人脸是否是正脸,当任意两个第二人脸关键点的坐标位置之间的距离,都位于各自对应的预设距离范围内时,确定第三人脸区域内的人脸是正脸,若存在两个第二人脸关键点的坐标位置之间的距离,超出对应的预设距离范围时,确定第三人脸区域内的人脸不是正脸。Then, the server determines whether the face in the third face area is a frontal face according to the coordinate position corresponding to the key point of the second face. Specifically, according to the distance between the coordinate positions of any two second face key points, it is judged whether the face in the third face area is a frontal face, when the coordinate position of any two second face key points is between When the distances between the two are all within their corresponding preset distance ranges, it is determined that the face in the third face area is a frontal face. If there is a distance between the coordinate positions of two second face key points, the When the distance range is preset, it is determined that the face in the third face area is not a frontal face.
例如,左眼和右眼之间的距离为L1,而左眼和右眼之间对应的预设距离范围为[L2,L3],左眼和右眼之间的距离L1没有位于预设距离范围[L2,L3]内,则判断第三人脸区域内的人脸不是正脸。For example, the distance between the left eye and the right eye is L1, and the corresponding preset distance range between the left eye and the right eye is [L2, L3], the distance L1 between the left eye and the right eye is not located at the preset distance Within the range [L2, L3], it is determined that the face in the third face area is not a frontal face.
当服务器判断第三人脸区域内的人脸是正脸时,才执行提取第三人脸区域内的第二人脸特征点及之后的步骤,即执行步骤403至步骤406。When the server determines that the face in the third face area is a frontal face, it performs the steps of extracting the second face feature point in the third face area and the subsequent steps, ie, steps 403 to 406 are performed.
然而,当第三人脸区域内的人脸被遮挡,或者,第三人脸区域内的人脸不是正脸时,表示第三终端发送的人脸图像不符合要求,整个执行过程结束,服务器不会根据第三终端发送的人脸图像和注册用户的身份信息进行注册。However, when the face in the third face area is blocked, or the face in the third face area is not a frontal face, it means that the face image sent by the third terminal does not meet the requirements, the whole execution process ends, and the server Registration will not be performed based on the face image sent by the third terminal and the identity information of the registered user.
本申请实施例中,通过提取裁剪图像中的第一人脸特征点,将第一人脸特征点与服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识,然后获取用户标识对应的目标用户的身份信息,当采集的视频流中包含目标用户的人脸时,裁剪图像也就相应包含目标用户的人脸,基于裁剪图像就可及时了解目标用户的身份信息,工作人员就可以针对用户的身份信息及时进行营销方案的定制,提高营销的效率,且提升用户的体验效果。In the embodiment of the present application, by extracting the first face feature point in the cropped image, comparing the first face feature point with the face feature point stored in the server, determining the user ID corresponding to the target user, and then obtaining the user ID Corresponding to the identity information of the target user, when the collected video stream contains the face of the target user, the cropped image also contains the face of the target user. Based on the cropped image, the identity information of the target user can be known in time. The marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.
参照图5,示出了本申请实施例的一种终端的结构框图。Referring to FIG. 5 , a structural block diagram of a terminal according to an embodiment of the present application is shown.
本申请实施例提供的终端500为第一终端,该终端500包括:The terminal 500 provided in this embodiment of the present application is a first terminal, and the terminal 500 includes:
第一待识别图像提取模块501,被配置为从视频流中提取至少一帧第一待识别图像;The first to-be-recognized image extraction module 501 is configured to extract at least one frame of the first to-be-recognized image from the video stream;
第一人脸区域检测模块502,被配置为基于目标检测模型,检测每一帧所述第一待识别图像中的人脸所在的第一人脸区域;The first face region detection module 502 is configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model;
目标人脸区域确定模块503,被配置为从各帧所述第一待识别图像中选取目标图像,并将所述目标图像对应的第一人脸区域确定为目标人脸区域;The target face area determination module 503 is configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;
目标图像裁剪模块504,被配置为根据所述目标人脸区域,对所述目标图像进行裁剪,得到裁剪图像;The target image cropping module 504 is configured to crop the target image according to the target face region to obtain a cropped image;
裁剪图像发送模块505,被配置为将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用户的身份信息。The cropped image sending module 505 is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
可选的,所述第一待识别图像提取模块501,包括:Optionally, the first to-be-recognized image extraction module 501 includes:
第一待识别图像提取子模块,被配置为每间隔预设帧数,从所述视频流中提取至少两帧第一待识别图像;The first to-be-recognized image extraction submodule is configured to extract at least two frames of the first to-be-recognized image from the video stream at a preset number of frames per interval;
所述目标人脸区域确定模块503,包括:The target face area determination module 503 includes:
交并比计算子模块,被配置为针对所述至少两帧第一待识别图像,分别计算每一帧所述第一待识别图像对应的第一人脸区域与相邻的前一帧所述第一待识别图像对应的第一人脸区域的交并比;The intersection ratio calculation sub-module is configured to, for the at least two frames of the first to-be-recognized image, respectively calculate the first face region corresponding to the first to-be-recognized image of each frame and the adjacent previous frame The intersection ratio of the first face region corresponding to the first to-be-recognized image;
目标人脸区域确定子模块,被配置为当所述至少两帧第一待识别图像对应的交并比均大于或等于第一设定阈值时,从所述至少两帧第一待识别图像中选取任意一帧所述第一待识别图像作为目标图像,并将所述目标图像对应的第一人脸区域确定为所述目标人脸区域。The target face area determination sub-module is configured to, when the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to a first set threshold, extract the data from the at least two frames of the first to-be-recognized images Select any frame of the first image to be recognized as the target image, and determine the first face region corresponding to the target image as the target face region.
可选的,所述目标图像裁剪模块504,包括:Optionally, the target image cropping module 504 includes:
坐标位置检测子模块,被配置为检测所述目标人脸区域内的第一人脸关键点对应的坐标位置;a coordinate position detection submodule, configured to detect the coordinate position corresponding to the first face key point in the target face area;
目标图像裁剪子模块,被配置为根据所述第一人脸关键点对应的坐标位置,从所述目标图像中截取包括所述第一人脸关键点的区域,得到所述裁剪图像。The target image cropping sub-module is configured to intercept an area including the first facial key point from the target image according to the coordinate position corresponding to the first facial key point to obtain the cropped image.
可选的,所述终端500还包括:Optionally, the terminal 500 further includes:
用户标识接收模块,被配置为接收所述服务器发送的目标用户的用户标识,并存储所述用户标识和所述用户标识对应的所述目标人脸区域;a user identification receiving module, configured to receive the user identification of the target user sent by the server, and store the user identification and the target face area corresponding to the user identification;
第二待识别图像提取模块,被配置为从所述视频流中依次提取N帧第二待识别图像;所述第二待识别图像是所述视频流中位于所述第一待识别图像之后的视频帧,所述N为大于1的正整数;The second to-be-recognized image extraction module is configured to sequentially extract N frames of the second to-be-recognized image from the video stream; the second to-be-recognized image is located after the first to-be-recognized image in the video stream Video frame, the N is a positive integer greater than 1;
第二人脸区域检测模块,被配置为基于所述目标检测模型,检测每一帧所述第二待识别图像中的人脸所在的第二人脸区域;The second face region detection module is configured to detect, based on the target detection model, the second face region where the face in the second to-be-recognized image of each frame is located;
交并比计算模块,被配置为计算所述目标人脸区域与每一帧所述第二待识别图像对应的第二人脸区域的交并比;an intersection ratio calculation module, configured to calculate the intersection ratio between the target face region and the second face region corresponding to the second to-be-recognized image of each frame;
当连续提取的N帧所述第二待识别图像对应的第二人脸区域中,存在与所述目标人脸区域的交并比大于或等于第二设定阈值的第二人脸区域时,执行第二待识别图像提取模块;When there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the second to-be-recognized image of N frames continuously extracted, Execute the second to-be-recognized image extraction module;
用户标识删除模块,被配置为当连续提取的N帧所述第二待识别图像对应的第二人脸区域,与所述目标人脸区域的交并比均小于所述第二设定阈值时,删除所述用户标识和所述目标人脸区域,并重新执行所述第一待识别图像提取模块501。The user identification deletion module is configured to when the second face region corresponding to the second to-be-recognized image of N frames continuously extracted and the intersection ratio of the target face region and the target face region are all smaller than the second set threshold , delete the user ID and the target face region, and execute the first image extraction module 501 to be recognized again.
可选的,所述第一人脸区域检测模块502,包括:Optionally, the first face region detection module 502 includes:
第一待识别图像压缩子模块,被配置为对每一帧所述第一待识别图像进行压缩处理,使得压缩后的所述第一待识别图像的尺寸小于压缩前的所述第一待识别图像的尺寸;The first to-be-recognized image compression sub-module is configured to perform compression processing on each frame of the first to-be-recognized image, so that the size of the compressed first to-be-recognized image is smaller than the size of the first to-be-recognized image before compression the size of the image;
第一人脸区域检测子模块,被配置为将压缩后的每一帧所述第一待识别图像输入SSD模型中,得到每一帧所述第一待识别图像中的人脸所在的第一人脸区域。The first face area detection sub-module is configured to input the compressed first image to be recognized in each frame into the SSD model, and obtain the first image where the face in each frame of the first image to be recognized is located. face area.
在本申请实施例中,通过从视频流中提取至少一帧第一待识别图像,并从中选取目标图像,将目标图像对应的第一人脸区域确定为目标人脸区域,然后根据目标人脸区域对目标图像进行裁剪,将裁剪图像发送至服务器进行人脸识别,确定目标用户的身份信息,当采集的视频流中包含目标用户的人脸时,就可及时了解目标用户的身份信息,工作人员就可以针对用户的身份信息及时进行营销方案的定制,提高营销的效率,且提升用户的体验效果。In the embodiment of the present application, by extracting at least one frame of the first image to be recognized from the video stream, and selecting the target image from it, the first face area corresponding to the target image is determined as the target face area, and then according to the target face area The target image is cropped in the area, and the cropped image is sent to the server for face recognition to determine the identity information of the target user. When the collected video stream contains the face of the target user, the identity information of the target user can be known in time. The personnel can customize the marketing plan in time according to the user's identity information, improve the marketing efficiency, and improve the user's experience effect.
参照图6,示出了本申请实施例的一种服务器的结构框图。Referring to FIG. 6 , a structural block diagram of a server according to an embodiment of the present application is shown.
本申请实施例提供的服务器600包括:The server 600 provided by the embodiment of the present application includes:
裁剪图像接收模块601,被配置为接收第一终端发送的裁剪图像;所述裁剪图像是所述第一终端根据目标人脸区域对目标图像进行裁剪后得到的;The cropped image receiving module 601 is configured to receive the cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face area;
第一人脸特征点提取模块602,被配置为提取所述裁剪图像中的第一人脸特征点;The first face feature point extraction module 602 is configured to extract the first face feature point in the cropped image;
用户标识确定模块603,被配置为将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识;The user identification determination module 603 is configured to compare the first facial feature point with the facial feature point stored in the server, and determine the user identification corresponding to the target user;
身份信息获取模块604,被配置为获取所述用户标识对应的目标用户的身份信息。The identity information obtaining module 604 is configured to obtain the identity information of the target user corresponding to the user identification.
可选的,所述服务器600还包括:Optionally, the server 600 further includes:
身份信息发送模块,被配置为将所述目标用户的身份信息发送至第二终端,以通过所述第二终端进行所述目标用户的到访提醒。The identity information sending module is configured to send the identity information of the target user to a second terminal, so as to remind the target user of visiting through the second terminal.
可选的,所述服务器600还包括:Optionally, the server 600 further includes:
用户标识发送模块,被配置为将所述目标用户的用户标识发送至所述第一终端,以通过所述第一终端计算连续提取的N帧第二待识别图像对应的第二人脸区域与所述目标人脸区域的交并比,从而实现所述目标用户的跟踪;所述N为大于1的正整数。The user identification sending module is configured to send the user identification of the target user to the first terminal, so as to calculate the second face area corresponding to the N frames of the second to-be-recognized images continuously extracted by the first terminal and The intersection ratio of the target face area, so as to realize the tracking of the target user; the N is a positive integer greater than 1.
可选的,所述服务器600还包括:Optionally, the server 600 further includes:
人脸图像接收模块,被配置为接收第三终端发送的人脸图像和注册用户的身份信息;a face image receiving module, configured to receive the face image and the identity information of the registered user sent by the third terminal;
第三人脸区域检测模块,被配置为基于目标检测模型,检测所述人脸图像中的人脸所在的第三人脸区域;The third face area detection module is configured to detect the third face area where the face in the face image is located based on the target detection model;
第二人脸特征点提取模块,被配置为提取所述第三人脸区域内的第二人脸特征点;The second face feature point extraction module is configured to extract the second face feature point in the third face region;
人脸特征点对比模块,被配置为将所述第二人脸特征点与所述服务器内存储的人脸特征点进行对比,确定所述服务器内是否存储有所述第二人脸特征点;a face feature point comparison module, configured to compare the second face feature point with the face feature point stored in the server, and determine whether the second face feature point is stored in the server;
注册错误信息返回模块,被配置为当所述服务器内存储有所述第二人脸 特征点时,向所述第三终端返回注册错误信息;A registration error information return module is configured to return registration error information to the third terminal when the second face feature point is stored in the server;
第二人脸特征点保存模块,被配置为当所述服务器内未存储有所述第二人脸特征点时,保存所述第二人脸特征点和所述注册用户的身份信息,并生成分别与所述第二人脸特征点和所述注册用户的身份信息相关联的用户标识。The second face feature point storage module is configured to save the second face feature point and the identity information of the registered user when the second face feature point is not stored in the server, and generate User identifiers respectively associated with the second facial feature points and the identity information of the registered user.
可选的,所述服务器600还包括:Optionally, the server 600 further includes:
遮挡判断模块,被配置为基于人脸遮挡模型,确定所述第三人脸区域内的人脸是否被遮挡;an occlusion judgment module, configured to determine whether the face in the third face region is occluded based on the face occlusion model;
坐标位置检测模块,被配置为当所述第三人脸区域内的人脸未被遮挡时,检测所述第三人脸区域内的第二人脸关键点对应的坐标位置;a coordinate position detection module, configured to detect the coordinate position corresponding to the key point of the second face in the third face area when the face in the third face area is not blocked;
正脸判断模块,被配置为根据所述第二人脸关键点对应的坐标位置,确定所述第三人脸区域内的人脸是否是正脸;a front face judgment module, configured to determine whether the face in the third face area is a front face according to the coordinate position corresponding to the second face key point;
当所述第三人脸区域内的人脸是正脸时,执行所述第二人脸特征点提取模块。When the face in the third face region is a frontal face, the second face feature point extraction module is executed.
本申请实施例中,通过提取裁剪图像中的第一人脸特征点,将第一人脸特征点与服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识,然后获取用户标识对应的目标用户的身份信息,当采集的视频流中包含目标用户的人脸时,裁剪图像也就相应包含目标用户的人脸,基于裁剪图像就可及时了解目标用户的身份信息,工作人员就可以针对用户的身份信息及时进行营销方案的定制,提高营销的效率,且提升用户的体验效果。In the embodiment of the present application, by extracting the first face feature point in the cropped image, comparing the first face feature point with the face feature point stored in the server, determining the user ID corresponding to the target user, and then obtaining the user ID Corresponding to the identity information of the target user, when the collected video stream contains the face of the target user, the cropped image also contains the face of the target user. Based on the cropped image, the identity information of the target user can be known in time. The marketing plan can be customized in time according to the user's identity information, so as to improve the efficiency of marketing and improve the user's experience effect.
相应的,本申请实施例还提供了一种终端,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上述终端侧的身份识别方法的步骤。Correspondingly, an embodiment of the present application further provides a terminal, including a processor, a memory, and a computer program stored on the memory and running on the processor, when the computer program is executed by the processor Steps for implementing the above-mentioned terminal-side identification method.
本申请实施例还提供了一种计算机可读介质,所述计算机可读介质上存储计算机程序,所述计算机程序被处理器执行时实现上述的终端侧的身份识别方法的步骤。Embodiments of the present application further provide a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the above-mentioned terminal-side identification method are implemented.
相应的,本申请实施例还提供了一种服务器,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上述服务器侧的身份识别方法的步骤。Correspondingly, an embodiment of the present application further provides a server, including a processor, a memory, and a computer program stored on the memory and running on the processor, when the computer program is executed by the processor The steps of implementing the above-mentioned server-side identification method.
本申请实施例还提供了一种计算机可读介质,所述计算机可读介质上存储计算机程序,所述计算机程序被处理器执行时实现上述的服务器侧的身份识别方法的步骤。Embodiments of the present application further provide a computer-readable medium, where a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the foregoing server-side identification method are implemented.
参照图7,示出了本申请实施例的一种身份识别系统的结构图。Referring to FIG. 7 , a structural diagram of an identity recognition system according to an embodiment of the present application is shown.
本申请实施例还提供了一种身份识别系统,包括摄像头701、第二终端702、第三终端703,上述的第一终端500以及上述的服务器600。An embodiment of the present application further provides an identity recognition system, including a camera 701 , a second terminal 702 , a third terminal 703 , the above-mentioned first terminal 500 and the above-mentioned server 600 .
其中,摄像头701,被配置采集视频流,并将视频流发送至第一终端500;第二终端702,被配置为接收服务器600发送的目标用户的身份信息,以进行目标用户的到访提醒;第三终端703,被配置为向服务器600发送人脸图像和注册用户的身份信息,以及当服务器600内存储有人脸图像包含的第二人脸特征点时,接收服务器600返回的注册错误信息。Wherein, the camera 701 is configured to capture the video stream and send the video stream to the first terminal 500; the second terminal 702 is configured to receive the identity information of the target user sent by the server 600, so as to remind the target user of the visit; The third terminal 703 is configured to send the face image and the identity information of the registered user to the server 600, and receive the registration error information returned by the server 600 when the second face feature point included in the face image is stored in the server 600.
在本申请实施例中,摄像头701将视频流发送至第一终端500,第一终端从视频流中提取视频帧,并从视频帧选取第一待识别图像和第二待识别图像,视频流中位于第一待识别图像之后的视频帧为第二待识别图像。In the embodiment of the present application, the camera 701 sends the video stream to the first terminal 500, the first terminal extracts the video frame from the video stream, and selects the first image to be recognized and the second image to be recognized from the video frame. The video frame located after the first to-be-recognized image is the second to-be-recognized image.
第一终端500将第一待识别图像输入目标检测模型,得到每一帧第一待识别图像中的人脸所在的第一人脸区域,从各帧第一待识别图像中选取目标图像,并将目标图像对应的第一人脸区域确定为目标人脸区域;然后,检测目标人脸区域内的第一人脸关键点,并确定第一人脸关键点的坐标位置,根据第一人脸关键点对应的坐标位置,从目标图像中截取包括第一人脸关键点的区域,得到裁剪图像。The first terminal 500 inputs the first to-be-recognized image into the target detection model, obtains the first face region where the face in each frame of the first to-be-recognized image is located, selects the target image from each frame of the first to-be-recognized image, and Determine the first face area corresponding to the target image as the target face area; then, detect the first face key point in the target face area, and determine the coordinate position of the first face key point, according to the first face The coordinate position corresponding to the key point is intercepted from the target image including the area of the first face key point to obtain a cropped image.
第一终端500将裁剪图像发送至服务器600,服务器600提取裁剪图像中的第一人脸特征点,并将第一人脸特征点与人脸特征库中存储的人脸特征点进行对比,确定目标用户对应的用户标识;服务器600根据用户标识从用户身份信息数据库中查询到目标用户的身份信息,将目标用户的身份信息发送至第二终端702,以通过第二终端702进行目标用户的到访提醒。The first terminal 500 sends the cropped image to the server 600, and the server 600 extracts the first facial feature point in the cropped image, and compares the first facial feature point with the facial feature point stored in the facial feature database to determine The user identification corresponding to the target user; the server 600 queries the identification information of the target user from the user identification information database according to the user identification, and sends the identification information of the target user to the second terminal 702, so as to carry out the identification of the target user through the second terminal 702. Visit reminder.
并且,服务器600还会将目标用户对应的用户标识发送至第一终端500,第一终端500将用户标识和用户标识对应的目标人脸区域缓存下来,第一终端500也会将第二待识别图像输入目标检测模型,得到第二待识别图像中的人脸所在的第二人脸区域,然后,第一终端500计算连续提取的N帧第二待 识别图像对应的第二人脸区域与目标人脸区域的交并比,从而实现目标用户的跟踪。In addition, the server 600 will also send the user identification corresponding to the target user to the first terminal 500, the first terminal 500 will cache the user identification and the target face area corresponding to the user identification, and the first terminal 500 will also store the second pending identification. The image is input to the target detection model to obtain the second face region where the face in the second to-be-recognized image is located, and then the first terminal 500 calculates the second face region corresponding to the N frames of the second to-be-recognized image continuously extracted and the target The intersection ratio of the face area, so as to realize the tracking of the target user.
此外,注册用户还可通过第三终端703向服务器600发送人脸图像和注册用户的身份信息,以进行用户信息注册,当注册成功时,服务器600将人脸图像中的第二人脸特征点存储到人脸特征库中,并将注册用户的身份信息存储到用户身份信息数据库中。In addition, the registered user can also send the face image and the identity information of the registered user to the server 600 through the third terminal 703 to register the user information. When the registration is successful, the server 600 sends the second face feature point in the face image to the server 600 Store in the face feature database, and store the registered user's identity information in the user identity information database.
在实际使用中,摄像头可部署在银行营业厅或其他需要进行目标用户的身份信息识别的场合中。In actual use, the camera can be deployed in a bank business hall or other occasions where identification of the target user's identity information is required.
在本申请实施例中,通过从视频流中提取至少一帧第一待识别图像,并从中选取目标图像,将目标图像对应的第一人脸区域确定为目标人脸区域,然后根据目标人脸区域对目标图像进行裁剪,将裁剪图像发送至服务器,服务器提取裁剪图像中的第一人脸特征点,将第一人脸特征点与服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识,然后获取用户标识对应的目标用户的身份信息,当采集的视频流中包含目标用户的人脸时,裁剪图像也就相应包含目标用户的人脸,基于裁剪图像就可及时了解目标用户的身份信息,工作人员就可以针对用户的身份信息及时进行营销方案的定制,提高营销的效率,且提升用户的体验效果。In the embodiment of the present application, by extracting at least one frame of the first image to be recognized from the video stream, and selecting the target image from it, the first face area corresponding to the target image is determined as the target face area, and then according to the target face area The target image is cropped in the region, and the cropped image is sent to the server. The server extracts the first facial feature point in the cropped image, compares the first facial feature point with the facial feature points stored in the server, and determines the corresponding target user. and then obtain the identity information of the target user corresponding to the user ID. When the collected video stream contains the face of the target user, the cropped image also contains the face of the target user accordingly, and the target can be known in time based on the cropped image. With the user's identity information, the staff can customize the marketing plan in time according to the user's identity information, improve the efficiency of marketing, and improve the user's experience effect.
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。As for the apparatus embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for related parts.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的计算处理设备中的一些或者全部部件的一些或者全部功能。 本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the computing processing device according to the embodiments of the present application. The present application can also be implemented as an apparatus or apparatus program (eg, computer programs and computer program products) for performing part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.
例如,图8示出了可以实现根据本申请的方法的计算处理设备,例如前述的服务器600或者终端500。该计算处理设备传统上包括处理器810和以存储器820形式的计算机程序产品或者计算机可读介质。存储器820可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器820具有用于执行上述方法中的任何方法步骤的程序代码831的存储空间830。例如,用于程序代码的存储空间830可以包括分别用于实现上面的方法中的各种步骤的各个程序代码831。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为如参考图9所述的便携式或者固定存储单元。该存储单元可以具有与图8的计算处理设备中的存储器820类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常,存储单元包括计算机可读代码831’,即可以由例如诸如810之类的处理器读取的代码,这些代码当由计算处理设备运行时,导致该计算处理设备执行上面所描述的方法中的各个步骤。For example, FIG. 8 shows a computing processing device, such as the aforementioned server 600 or terminal 500, that can implement the method according to the present application. The computing processing device traditionally includes a processor 810 and a computer program product or computer readable medium in the form of a memory 820 . The memory 820 may be electronic memory such as flash memory, EEPROM (electrically erasable programmable read only memory), EPROM, hard disk, or ROM. The memory 820 has storage space 830 for program code 831 for performing any of the method steps in the above-described methods. For example, storage space 830 for program code may include various program codes 831 for implementing various steps in the above methods, respectively. These program codes can be read from or written to one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such computer program products are typically portable or fixed storage units as described with reference to FIG. 9 . The storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 820 in the computing processing device of FIG. 8 . The program code may, for example, be compressed in a suitable form. Typically, the storage unit includes computer readable code 831', ie code readable by a processor such as 810 for example, which when executed by a computing processing device, causes the computing processing device to perform any of the methods described above. of the various steps.
本文中所称的“一个实施例”、“实施例”或者“一个或者多个实施例”意味着,结合实施例描述的特定特征、结构或者特性包括在本申请的至少一个实施例中。此外,请注意,这里“在一个实施例中”的词语例子不一定全指同一个实施例。Reference herein to "one embodiment," "an embodiment," or "one or more embodiments" means that a particular feature, structure, or characteristic described in connection with an embodiment is included in at least one embodiment of the present application. Also, please note that instances of the phrase "in one embodiment" herein are not necessarily all referring to the same embodiment.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下被实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that the embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件 之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still be Modifications are made to the technical solutions recorded in the foregoing embodiments, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (17)

  1. 一种身份识别方法,其特征在于,应用于第一终端,所述方法包括:An identity recognition method, characterized in that, applied to a first terminal, the method comprising:
    从视频流中提取至少一帧第一待识别图像;extracting at least one frame of the first image to be identified from the video stream;
    基于目标检测模型,检测每一帧所述第一待识别图像中的人脸所在的第一人脸区域;Based on the target detection model, detect the first face region where the face in the first to-be-recognized image of each frame is located;
    从各帧所述第一待识别图像中选取目标图像,并将所述目标图像对应的第一人脸区域确定为目标人脸区域;Select a target image from the first to-be-recognized images of each frame, and determine the first face region corresponding to the target image as the target face region;
    根据所述目标人脸区域,对所述目标图像进行裁剪,得到裁剪图像;According to the target face region, the target image is cropped to obtain a cropped image;
    将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用户的身份信息。Sending the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
  2. 根据权利要求1所述的方法,其特征在于,所述从视频流中提取至少一帧第一待识别图像的步骤,包括:The method according to claim 1, wherein the step of extracting at least one frame of the first image to be recognized from the video stream comprises:
    每间隔预设帧数,从所述视频流中提取至少两帧第一待识别图像;extracting at least two frames of the first to-be-identified image from the video stream every preset number of frames;
    所述从各帧所述第一待识别图像中选取目标图像,并将所述目标图像对应的第一人脸区域确定为目标人脸区域的步骤,包括:The step of selecting a target image from the first to-be-recognized images of each frame, and determining the first face region corresponding to the target image as the target face region, includes:
    针对所述至少两帧第一待识别图像,分别计算每一帧所述第一待识别图像对应的第一人脸区域与相邻的前一帧所述第一待识别图像对应的第一人脸区域的交并比;For the at least two frames of the first to-be-recognized images, calculate the first face region corresponding to the first to-be-recognized image of each frame and the first person corresponding to the adjacent previous frame of the first to-be-recognized image respectively. The intersection ratio of face area;
    当所述至少两帧第一待识别图像对应的交并比均大于或等于第一设定阈值时,从所述至少两帧第一待识别图像中选取任意一帧所述第一待识别图像作为目标图像,并将所述目标图像对应的第一人脸区域确定为所述目标人脸区域。When the intersection ratios corresponding to the at least two frames of the first to-be-recognized images are both greater than or equal to the first set threshold, select any frame of the first to-be-recognized image from the at least two frames of the first to-be-recognized images As the target image, the first face area corresponding to the target image is determined as the target face area.
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述目标人脸区域,对所述目标图像进行裁剪,得到裁剪图像的步骤,包括:The method according to claim 1, wherein the step of cropping the target image according to the target face region to obtain the cropped image comprises:
    检测所述目标人脸区域内的第一人脸关键点对应的坐标位置;Detecting the coordinate position corresponding to the first face key point in the target face area;
    根据所述第一人脸关键点对应的坐标位置,从所述目标图像中截取包括所述第一人脸关键点的区域,得到所述裁剪图像。According to the coordinate position corresponding to the first face key point, an area including the first face key point is intercepted from the target image to obtain the cropped image.
  4. 根据权利要求1所述的方法,其特征在于,在所述将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用 户的身份信息的步骤之后,还包括:The method according to claim 1, wherein after the step of sending the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user, further include:
    接收所述服务器发送的目标用户的用户标识,并存储所述用户标识和所述用户标识对应的所述目标人脸区域;Receive the user identification of the target user sent by the server, and store the user identification and the target face area corresponding to the user identification;
    从所述视频流中依次提取N帧第二待识别图像;所述第二待识别图像是所述视频流中位于所述第一待识别图像之后的视频帧,所述N为大于1的正整数;Extract N frames of second to-be-recognized images in sequence from the video stream; the second to-be-recognized image is a video frame located after the first to-be-recognized image in the video stream, and N is a positive value greater than 1 integer;
    基于所述目标检测模型,检测每一帧所述第二待识别图像中的人脸所在的第二人脸区域;Based on the target detection model, detect the second face region where the face in the second to-be-recognized image of each frame is located;
    计算所述目标人脸区域与每一帧所述第二待识别图像对应的第二人脸区域的交并比;Calculate the intersection ratio of the target face region and the second face region corresponding to the second to-be-recognized image of each frame;
    当连续提取的N帧所述第二待识别图像对应的第二人脸区域中,存在与所述目标人脸区域的交并比大于或等于第二设定阈值的第二人脸区域时,执行所述从所述视频流中依次提取N帧第二待识别图像及之后的步骤;When there is a second face region whose intersection ratio with the target face region is greater than or equal to the second set threshold in the second face region corresponding to the second to-be-recognized image of N frames continuously extracted, performing the steps of sequentially extracting N frames of the second to-be-recognized image from the video stream and subsequent steps;
    当连续提取的N帧所述第二待识别图像对应的第二人脸区域,与所述目标人脸区域的交并比均小于所述第二设定阈值时,删除所述用户标识和所述目标人脸区域,并重新执行所述从视频流中提取至少一帧第一待识别图像及之后的步骤。When the intersection ratio of the second face region corresponding to the second to-be-recognized image of N frames continuously extracted and the target face region is smaller than the second set threshold, delete the user identification and all the target face region, and re-execute the steps of extracting at least one frame of the first image to be recognized from the video stream and the subsequent steps.
  5. 根据权利要求1所述的方法,其特征在于,所述基于目标检测模型,检测每一帧所述第一待识别图像中的人脸所在的第一人脸区域的步骤,包括:The method according to claim 1, wherein the step of detecting the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model comprises:
    对每一帧所述第一待识别图像进行压缩处理,使得压缩后的所述第一待识别图像的尺寸小于压缩前的所述第一待识别图像的尺寸;compressing each frame of the first image to be identified, so that the size of the first image to be identified after compression is smaller than the size of the first image to be identified before compression;
    将压缩后的每一帧所述第一待识别图像输入SSD模型中,得到每一帧所述第一待识别图像中的人脸所在的第一人脸区域。Inputting the compressed first image to be recognized in each frame into the SSD model to obtain a first face region where the face in the first image to be recognized is located in each frame.
  6. 一种身份识别方法,其特征在于,应用于服务器,所述方法包括:An identity recognition method, characterized in that, applied to a server, the method comprising:
    接收第一终端发送的裁剪图像;所述裁剪图像是所述第一终端根据目标人脸区域对目标图像进行裁剪后得到的;receiving the cropped image sent by the first terminal; the cropped image is obtained by the first terminal cropping the target image according to the target face region;
    提取所述裁剪图像中的第一人脸特征点;extracting the first face feature point in the cropped image;
    将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确 定目标用户对应的用户标识;Comparing the first facial feature point with the facial feature point stored in the server to determine the user ID corresponding to the target user;
    获取所述用户标识对应的目标用户的身份信息。Acquire the identity information of the target user corresponding to the user identifier.
  7. 根据权利要求6所述的方法,其特征在于,在所述获取所述用户标识对应的目标用户的身份信息的步骤之后,还包括:The method according to claim 6, wherein after the step of acquiring the identity information of the target user corresponding to the user identifier, the method further comprises:
    将所述目标用户的身份信息发送至第二终端,以通过所述第二终端进行所述目标用户的到访提醒。The identity information of the target user is sent to the second terminal, so as to remind the target user of visiting through the second terminal.
  8. 根据权利要求6所述的方法,其特征在于,在所述将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识的步骤之后,还包括:The method according to claim 6, wherein, after the step of comparing the first facial feature point with the facial feature points stored in the server to determine the user identifier corresponding to the target user, Also includes:
    将所述目标用户的用户标识发送至所述第一终端,以通过所述第一终端计算连续提取的N帧第二待识别图像对应的第二人脸区域与所述目标人脸区域的交并比,从而实现所述目标用户的跟踪;所述N为大于1的正整数。The user identification of the target user is sent to the first terminal, so as to calculate the intersection of the second face area corresponding to the N frames of the second to-be-recognized images continuously extracted and the target face area through the first terminal. And compare, so as to realize the tracking of the target user; the N is a positive integer greater than 1.
  9. 根据权利要求6所述的方法,其特征在于,在所述将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识的步骤之前,还包括:The method according to claim 6, wherein, before the step of comparing the first facial feature point with the facial feature points stored in the server to determine the user identifier corresponding to the target user, Also includes:
    接收第三终端发送的人脸图像和注册用户的身份信息;Receive the face image and the identity information of the registered user sent by the third terminal;
    基于目标检测模型,检测所述人脸图像中的人脸所在的第三人脸区域;Based on the target detection model, detect the third face region where the face in the face image is located;
    提取所述第三人脸区域内的第二人脸特征点;extracting the second face feature points in the third face region;
    将所述第二人脸特征点与所述服务器内存储的人脸特征点进行对比,确定所述服务器内是否存储有所述第二人脸特征点;Comparing the second face feature point with the face feature point stored in the server to determine whether the second face feature point is stored in the server;
    当所述服务器内存储有所述第二人脸特征点时,向所述第三终端返回注册错误信息;When the second face feature point is stored in the server, returning registration error information to the third terminal;
    当所述服务器内未存储有所述第二人脸特征点时,保存所述第二人脸特征点和所述注册用户的身份信息,并生成分别与所述第二人脸特征点和所述注册用户的身份信息相关联的用户标识。When the second face feature point is not stored in the server, save the second face feature point and the identity information of the registered user, and generate a The user ID associated with the identity information of the registered user.
  10. 根据权利要求9所述的方法,其特征在于,在所述基于目标检测模型,检测所述人脸图像中的人脸所在的第三人脸区域的步骤之后,还包括:The method according to claim 9, characterized in that, after the step of detecting the third face region where the face in the face image is located based on the target detection model, further comprising:
    基于人脸遮挡模型,确定所述第三人脸区域内的人脸是否被遮挡;Based on the face occlusion model, determine whether the face in the third face area is occluded;
    当所述第三人脸区域内的人脸未被遮挡时,检测所述第三人脸区域内的 第二人脸关键点对应的坐标位置;When the face in the third face area is not blocked, detect the coordinate position corresponding to the second face key point in the third face area;
    根据所述第二人脸关键点对应的坐标位置,确定所述第三人脸区域内的人脸是否是正脸;According to the coordinate positions corresponding to the key points of the second face, determine whether the face in the third face area is a frontal face;
    当所述第三人脸区域内的人脸是正脸时,执行所述提取所述第三人脸区域内的第二人脸特征点及之后的步骤。When the face in the third face region is a frontal face, the steps of extracting the second face feature point in the third face region and subsequent steps are performed.
  11. 一种终端,其特征在于,所述终端为第一终端,所述第一终端包括:A terminal, wherein the terminal is a first terminal, and the first terminal includes:
    第一待识别图像提取模块,被配置为从视频流中提取至少一帧第一待识别图像;a first to-be-recognized image extraction module, configured to extract at least one frame of the first to-be-recognized image from the video stream;
    第一人脸区域检测模块,被配置为基于目标检测模型,检测每一帧所述第一待识别图像中的人脸所在的第一人脸区域;a first face region detection module, configured to detect the first face region where the face in the first to-be-recognized image of each frame is located based on the target detection model;
    目标人脸区域确定模块,被配置为从各帧所述第一待识别图像中选取目标图像,并将所述目标图像对应的第一人脸区域确定为目标人脸区域;a target face area determination module, configured to select a target image from the first to-be-recognized images of each frame, and determine the first face area corresponding to the target image as the target face area;
    目标图像裁剪模块,被配置为根据所述目标人脸区域,对所述目标图像进行裁剪,得到裁剪图像;a target image cropping module, configured to crop the target image according to the target face region to obtain a cropped image;
    裁剪图像发送模块,被配置为将所述裁剪图像发送至服务器,以通过所述服务器对所述裁剪图像进行识别,来确定目标用户的身份信息。The cropped image sending module is configured to send the cropped image to a server, so as to identify the cropped image by the server to determine the identity information of the target user.
  12. 一种服务器,其特征在于,包括:A server, characterized in that it includes:
    裁剪图像接收模块,被配置为接收第一终端发送的裁剪图像;所述裁剪图像是所述第一终端根据目标人脸区域对目标图像进行裁剪后得到的;a cropped image receiving module, configured to receive a cropped image sent by the first terminal; the cropped image is obtained by the first terminal after cropping the target image according to the target face region;
    第一人脸特征点提取模块,被配置为提取所述裁剪图像中的第一人脸特征点;a first face feature point extraction module, configured to extract the first face feature point in the cropped image;
    用户标识确定模块,被配置为将所述第一人脸特征点与所述服务器内存储的人脸特征点进行对比,确定目标用户对应的用户标识;a user identification determining module, configured to compare the first facial feature points with the facial feature points stored in the server, and determine the user identification corresponding to the target user;
    身份信息获取模块,被配置为获取所述用户标识对应的目标用户的身份信息。The identity information acquisition module is configured to acquire the identity information of the target user corresponding to the user identification.
  13. 一种终端,其特征在于,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至5中任一项所述的身份识别方法的步骤。A terminal, characterized by comprising a processor, a memory, and a computer program stored on the memory and running on the processor, the computer program being executed by the processor to implement the methods described in claims 1 to 1. The steps of any one of the identification methods in 5.
  14. 一种计算机可读介质,其特征在于,所述计算机可读介质上存储计 算机程序,所述计算机程序被处理器执行时实现如权利要求1至5中任一项所述的身份识别方法的步骤。A computer-readable medium, characterized in that a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the identification method according to any one of claims 1 to 5 are implemented .
  15. 一种服务器,其特征在于,包括处理器、存储器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求6至10中任一项所述的身份识别方法的步骤。A server, characterized in that it comprises a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program being executed by the processor to realize the steps of claims 6 to 6 The steps of any one of the identification methods in 10.
  16. 一种计算机可读介质,其特征在于,所述计算机可读介质上存储计算机程序,所述计算机程序被处理器执行时实现如权利要求6至10中任一项所述的身份识别方法的步骤。A computer-readable medium, characterized in that a computer program is stored on the computer-readable medium, and when the computer program is executed by a processor, the steps of the identification method according to any one of claims 6 to 10 are realized. .
  17. 一种身份识别系统,其特征在于,包括摄像头、第二终端、第三终端,如权利要求11所述的第一终端以及如权利要求12所述的服务器;An identity recognition system, characterized by comprising a camera, a second terminal, a third terminal, the first terminal as claimed in claim 11 and the server as claimed in claim 12;
    其中,所述摄像头,被配置采集视频流,并将所述视频流发送至所述第一终端;Wherein, the camera is configured to capture a video stream and send the video stream to the first terminal;
    所述第二终端,被配置为接收所述服务器发送的目标用户的身份信息,以进行所述目标用户的到访提醒;The second terminal is configured to receive the identity information of the target user sent by the server, so as to remind the target user of the visit;
    所述第三终端,被配置为向所述服务器发送人脸图像和注册用户的身份信息,以及当所述服务器内存储有所述人脸图像包含的第二人脸特征点时,接收所述服务器返回的注册错误信息。The third terminal is configured to send the face image and the identity information of the registered user to the server, and when the second face feature point included in the face image is stored in the server, receive the The registration error message returned by the server.
PCT/CN2020/139827 2020-12-28 2020-12-28 Identity recognition method, terminal, server, and system WO2022140879A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080003701.4A CN115066712A (en) 2020-12-28 2020-12-28 Identity recognition method, terminal, server and system
PCT/CN2020/139827 WO2022140879A1 (en) 2020-12-28 2020-12-28 Identity recognition method, terminal, server, and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/139827 WO2022140879A1 (en) 2020-12-28 2020-12-28 Identity recognition method, terminal, server, and system

Publications (1)

Publication Number Publication Date
WO2022140879A1 true WO2022140879A1 (en) 2022-07-07

Family

ID=82258614

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139827 WO2022140879A1 (en) 2020-12-28 2020-12-28 Identity recognition method, terminal, server, and system

Country Status (2)

Country Link
CN (1) CN115066712A (en)
WO (1) WO2022140879A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829739A (en) * 2024-03-05 2024-04-05 清电光伏科技有限公司 Dangerous chemical library informatization management system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184136A (en) * 2015-09-08 2015-12-23 京东方科技集团股份有限公司 Identity recognition method, device and system
CN110276277A (en) * 2019-06-03 2019-09-24 罗普特科技集团股份有限公司 Method and apparatus for detecting facial image
CN110705451A (en) * 2019-09-27 2020-01-17 支付宝(杭州)信息技术有限公司 Face recognition method, face recognition device, terminal and server
CN111696142A (en) * 2020-06-12 2020-09-22 广东联通通信建设有限公司 Rapid face detection method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184136A (en) * 2015-09-08 2015-12-23 京东方科技集团股份有限公司 Identity recognition method, device and system
CN110276277A (en) * 2019-06-03 2019-09-24 罗普特科技集团股份有限公司 Method and apparatus for detecting facial image
CN110705451A (en) * 2019-09-27 2020-01-17 支付宝(杭州)信息技术有限公司 Face recognition method, face recognition device, terminal and server
CN111696142A (en) * 2020-06-12 2020-09-22 广东联通通信建设有限公司 Rapid face detection method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117829739A (en) * 2024-03-05 2024-04-05 清电光伏科技有限公司 Dangerous chemical library informatization management system
CN117829739B (en) * 2024-03-05 2024-06-04 清电光伏科技有限公司 Dangerous chemical library informatization management system

Also Published As

Publication number Publication date
CN115066712A (en) 2022-09-16

Similar Documents

Publication Publication Date Title
US11074436B1 (en) Method and apparatus for face recognition
US10977515B2 (en) Image retrieving apparatus, image retrieving method, and setting screen used therefor
CN109117714B (en) Method, device and system for identifying fellow persons and computer storage medium
US9720936B2 (en) Biometric matching engine
KR102387495B1 (en) Image processing method and apparatus, electronic device and storage medium
US8805123B2 (en) System and method for video recognition based on visual image matching
CN109871815B (en) Method and device for inquiring monitoring information
WO2019033574A1 (en) Electronic device, dynamic video face recognition method and system, and storage medium
CN108875481B (en) Method, device, system and storage medium for pedestrian detection
US9183431B2 (en) Apparatus and method for providing activity recognition based application service
JP2014182480A (en) Person recognition device and method
CN106663196A (en) Computerized prominent person recognition in videos
JP2022552754A (en) IMAGE DETECTION METHOD AND RELATED DEVICE, DEVICE, STORAGE MEDIUM, AND COMPUTER PROGRAM
JP2002344946A (en) Monitoring system
CN114677607A (en) Real-time pedestrian counting method and device based on face recognition
WO2022140879A1 (en) Identity recognition method, terminal, server, and system
CN110263830B (en) Image processing method, device and system and storage medium
CN111382744B (en) Shop information acquisition method and device, terminal equipment and storage medium
US20200043175A1 (en) Image processing device, image processing method, and recording medium storing program
JP5946315B2 (en) Image search system
CN113902030A (en) Behavior identification method and apparatus, terminal device and storage medium
JP6244887B2 (en) Information processing apparatus, image search method, and program
US11250271B1 (en) Cross-video object tracking
US11657649B2 (en) Classification of subjects within a digital image
WO2022079841A1 (en) Group specifying device, group specifying method, and computer-readable recording medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20967232

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.10.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20967232

Country of ref document: EP

Kind code of ref document: A1