WO2022198821A1 - 人脸和人体匹配的方法、装置、电子设备、存储介质及程序 - Google Patents

人脸和人体匹配的方法、装置、电子设备、存储介质及程序 Download PDF

Info

Publication number
WO2022198821A1
WO2022198821A1 PCT/CN2021/102829 CN2021102829W WO2022198821A1 WO 2022198821 A1 WO2022198821 A1 WO 2022198821A1 CN 2021102829 W CN2021102829 W CN 2021102829W WO 2022198821 A1 WO2022198821 A1 WO 2022198821A1
Authority
WO
WIPO (PCT)
Prior art keywords
human body
face
mask
frame
human
Prior art date
Application number
PCT/CN2021/102829
Other languages
English (en)
French (fr)
Inventor
王彤舟
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Publication of WO2022198821A1 publication Critical patent/WO2022198821A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • the embodiments of the present disclosure relate to the field of computer technology, and in particular, to a method, an apparatus, an electronic device, a storage medium, and a program for matching a human face with a human body.
  • the technology of determining the identity of the person through the image information of the person is becoming more and more mature, and matching the face and the human body can determine the identity of the person more accurately.
  • the detected face and human body will be associated with the "same person".
  • the face and the human body are associated in advance, and the association process is realized by matching the face and the human body, so that the accuracy of the matching between the face and the human body is low.
  • the embodiments of the present disclosure provide a method, an apparatus, an electronic device, a storage medium and a program for matching a human face with a human body.
  • An embodiment of the present disclosure provides a method for matching a human face with a human body.
  • the method is performed by an electronic device, and the method includes:
  • a matching relationship between the face in the face frame and the human body in the human body mask is obtained.
  • the face and human body matching is performed based on the face frame and the human body mask, and the matching relationship between the human face and the human body can be accurately obtained.
  • the determining at least one human mask in the target image includes:
  • determining a target body mask in the single body frame In the case where a single body frame of the at least one body frame contains more than one body mask, determining a target body mask in the single body frame;
  • Delete other human body masks other than the target human body mask in the single human body frame Delete other human body masks other than the target human body mask in the single human body frame.
  • the target human mask in the single human frame can be determined, and then other human masks other than the target human mask in the single human frame can be deleted.
  • the box contains only one human mask, which can improve the accuracy of face and human matching.
  • the determining the target human mask in the single human frame includes:
  • the first human body mask with the larger area among the two first human body masks is used as the target human body mask.
  • the human body mask with the best image quality in a single human body frame can be quickly obtained, the accuracy of matching the face and the human body can be improved, and the image quality of the matched human body can be made higher, so as to meet the requirements of the subsequent matching results. Usage requirements.
  • the method further includes:
  • the single human body frame and the human body mask in the single human body frame are deleted.
  • the human body frame with poor image quality and the human body mask in the human body frame can be deleted, so as to reduce the influence on the matching of the human body and the human face in other human body masks, and improve the matching accuracy of the human face and the human body.
  • the image quality of the matched human body is made high, so as to meet the subsequent use requirements of the matching result.
  • the determining at least one human body frame in the target image includes:
  • the target image includes multiple human body frames, determining a first human body frame with the highest confidence in the multiple human body frames;
  • the second human body frame is a human body frame other than the first human body frame in the plurality of human body frames
  • the deleted second human body frame and the first human body frame are determined as the at least one human body frame. In this way, by removing the repeated human body frame, a human body frame with a higher degree of confidence can be obtained, the accuracy of the obtained human body frame is improved, and the accuracy of matching between a human face and a human body is improved.
  • the matching relationship between the face in the face frame and the human body in the human mask is obtained based on the position of the face frame and the position of the human body mask, include:
  • a matching relationship between the face in the face frame and the human body in the human mask is obtained.
  • the matching relationship between the face in the face frame and the human body in the human body mask is obtained, so that the matching between the face and the human body can be accurately obtained. relation.
  • the matching relationship between the face in the face frame and the human body in the human mask is obtained based on the distance between the face frame and the top of the human mask ,include:
  • the target image contains multiple face frames and multiple human masks
  • multiple correspondence sets are established according to different correspondence modes between the face frames and the human masks; wherein, a single correspondence It contains a set of one-to-one correspondence between each face frame and each human mask;
  • the matching score of the single correspondence set is determined according to the sum of multiple first distances in the single correspondence set, where the first distance is the distance between the face frame with the corresponding relationship and the top of the human mask, and the matching score is the value is negatively correlated with the sum of said first distances;
  • the corresponding relationship in the corresponding relationship set with the largest matching score is used as the matching relationship between the face frames and the human body masks in the target image. In this way, the obtained face-human body matching relationship in the corresponding relationship set with the largest matching score is the best overall, and the obtained face-human body matching relationship is more accurate as a whole.
  • the method further includes:
  • the matching relationship library is used to store the matching relationship between a human face and a human body
  • the identity information of the target human body is determined according to the human face. In this way, after obtaining the matching relationship between the face and the human body, in the case where there is only a human body in the image collected by the camera, it is possible to search for a face that has a matching relationship with the human body based on the human body, and then based on the matched human face Determining the identity information means determining the identity information of the human body in the collected image.
  • An embodiment of the present disclosure provides a device for matching a human face with a human body, including:
  • a face frame determination unit configured to determine at least one face frame in the target image
  • a human body mask determination unit configured to determine at least one human mask in the target image
  • the matching relationship determining unit is configured to obtain a matching relationship between the face in the face frame and the human body in the human mask based on the position of the face frame and the position of the human body mask.
  • the human body mask determination unit includes:
  • a human body frame determination unit configured to determine at least one human body frame in the target image and a human body mask in the at least one human body frame
  • a target human body mask determination unit configured to determine a target human body mask in the single human body frame when a single human body frame in the at least one human body frame contains more than one human body mask
  • the human body mask deletion unit is configured to delete other human body masks other than the target human body mask in the single human body frame.
  • the target human body mask determination unit includes:
  • a first human body mask determination subunit configured to determine two first human body masks with the largest area in a single human body frame
  • the target human body mask determination subunit is configured to, when the difference value of the areas of the two first human body masks is greater than the set threshold, determine the first human body with the larger area among the two first human body masks The mask serves as the target human mask.
  • the apparatus further includes:
  • the human body frame deletion unit is configured to delete the single human body frame and the human body mask in the single human body frame when the difference value of the areas of the two first human body masks is not greater than the set threshold.
  • the body frame determination unit includes:
  • a first human body frame determination unit configured to determine a first human body frame with the highest confidence in the plurality of human body frames when the target image includes multiple human body frames
  • an overlapping degree determination unit configured to determine the degree of overlap between the first human body frame and each second human body frame, and the second human body frame is a human body frame other than the first human body frame in the plurality of human body frames;
  • a second human body frame deletion unit configured to delete the second human body frame whose overlap degree is greater than the overlap degree threshold in the second human body frame
  • At least one human body frame determination unit is configured to determine the deleted second human body frame and the first human body frame as the at least one human body frame.
  • the matching relationship determining unit is configured to obtain the face in the face frame and the face in the human mask based on the distance between the face frame and the top of the human mask matching relationship between the human body.
  • the matching relationship determining unit includes:
  • the corresponding relationship set establishment unit is configured to, when the target image includes multiple face frames and multiple human body masks, establish multiple correspondences according to different correspondence modes between the respective face frames and the respective human body masks A relationship set, wherein a single corresponding relationship set contains a set of one-to-one correspondence between each face frame and each human body mask;
  • a matching score determination unit configured to determine a matching score of a single correspondence set according to the sum of a plurality of first distances in a single correspondence set; wherein, the first distance is a face frame and a human mask with a corresponding relationship the distance between the tips, the matching score is negatively correlated with the sum of the first distances;
  • the matching relationship determination subunit is configured to use the corresponding relationship in the corresponding relationship set with the largest matching score as the matching relationship between the face frames and the human body masks in the target image.
  • the apparatus further includes:
  • a storage unit configured to store the matching relationship in a matching relationship library, where the matching relationship library is used to store the matching relationship between a human face and a human body;
  • a search unit configured to search for the target human body in the matching relationship database in response to a query request for the identity information of the target human body
  • a face determination unit configured to determine a face that has a matching relationship with the target human body when the target human body is found
  • the identity information determining unit is configured to determine the identity information of the target human body according to the face.
  • An embodiment of the present disclosure provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to invoke the instructions stored in the memory to execute the above method.
  • Embodiments of the present disclosure provide a computer-readable storage medium, on which computer program instructions are stored, and when the computer program instructions are executed by a processor, the foregoing method is implemented.
  • the face and the human face in the face frame are obtained.
  • the matching relationship between the bodies in the body mask compared with the face and human body matching performed by the face frame and the human body frame, since the human body mask can accurately reflect the position of the human body, the face and human body matching based on the face frame and the human body mask can accurately Get the matching relationship between the face and the human body.
  • FIG. 1 shows a flowchart of a method for matching a face and a human body according to an embodiment of the present disclosure.
  • FIG. 2 shows a schematic diagram of a system architecture of a method for matching a human face and a human body according to an embodiment of the present disclosure
  • FIG. 3 shows a schematic flow diagram of a solution of a method for matching a human face and a human body according to an embodiment of the present disclosure
  • FIG. 4 shows a schematic diagram of the effect of a method for matching a human face and a human body according to an embodiment of the present disclosure
  • FIG. 5 shows a schematic diagram of an actual scene of a method for matching a human face and a human body according to an embodiment of the present disclosure
  • FIG. 6 shows a block diagram of a human face and human body matching apparatus according to an embodiment of the present disclosure
  • FIG. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 8 shows a block diagram of an electronic device according to an embodiment of the present disclosure.
  • the face and human body matching can be realized based on the characters in the same image.
  • the characters in the same image contain both the human face and the human body
  • the matching between the human face and the human body can be realized conveniently and quickly.
  • face frames and human body frames are often obtained through face detection and human body detection, and then face and human body matching is performed through the face frames and human body frames, but the human body frames often cannot accurately reflect the position of the human body, or the human body frame. There may be multiple human bodies in it, so the accuracy of matching results is lower in more complex scenarios.
  • the face and the human face in the face frame are obtained.
  • the matching relationship between the bodies in the body mask compared with the face and human body matching performed by the face frame and the human body frame, since the human body mask can accurately reflect the position of the human body, the face and human body matching based on the face frame and the human body mask can accurately Get the matching relationship between the face and the human body.
  • the face and human body matching methods provided by the embodiments of the present disclosure have high application value in many fields, for example, in the process of detecting target objects, in a scene with a high crowd density, or in a situation where the face is occluded Then, through the photographed human body of the target object and the pre-established matching relationship between the human face and the human body, the human face of the target object can be obtained, and then the identity of the target object can be determined according to the human face.
  • the face and human body matching method may be performed by an electronic device such as a terminal device or a server, and the terminal device may be a user equipment (User Equipment, UE), a mobile device, a user terminal, a terminal, a cellular Telephone, cordless telephone, Personal Digital Assistant (PDA), handheld device, computing device, vehicle-mounted device, wearable device, etc.
  • the method can be implemented by the processor calling the computer-readable instructions stored in the memory .
  • the method may be performed by a server.
  • the execution body of the face-to-body matching method may be a face-to-body matching device.
  • the implementation is introduced. It can be understood that the execution subject of the method is a face and human body matching device, which is only an exemplary illustration, and should not be construed as a limitation of the method.
  • FIG. 1 shows a flowchart of a method for matching a face and a human body according to an embodiment of the present disclosure. As shown in FIG. 1 , the method for matching between a human face and a human body includes:
  • step S11 at least one face frame in the target image is determined.
  • a face frame is an area in an image containing a face.
  • the face frame is generally a rectangular frame, and the vertices of the rectangle can be used to represent the detailed position of the rectangular frame (upper left corner, lower left corner, upper right corner and lower right corner).
  • the face frame is the positioning result of tracking and positioning the face.
  • the face frame in the target image can be determined by sliding the window, and the human face can be detected in the sliding window.
  • Face features, the window in which the face features are detected can be determined as the face frame.
  • the face features can be, for example, key points of the face, and the key points of the face are the key points on the face, for example, eyes (such as the corner of the eye, the center of the eyeball, the tail of the eye), the nose (such as the tip of the nose, the wing of the nose), the mouth (such as the lips, Key points such as lip corners, lip edges), chin, eyebrow corners, etc., based on the detection of these key points, the face frame can be located.
  • step S12 at least one human body mask in the target image is determined.
  • the human mask is used to indicate the area where the outline of the human body is located.
  • the human mask can be obtained based on instance segmentation. Instance segmentation is based on semantic segmentation to distinguish different individuals of the same type of objects. Therefore, the determined human mask corresponds to a single
  • the human body that is, a human body mask corresponds to a human body.
  • Instance segmentation can be implemented based on target detection technology and semantic segmentation technology. For example, firstly, human body is detected in the target image, and then the pixels corresponding to each human body are labeled, and each human body in the image can be distinguished and labeled. For example, through the target detection technology, a human body frame representing the position of the human body can be obtained, and on the basis of the human body frame, the contour regions of each human body in the human body frame are segmented, that is, a human body mask is obtained.
  • step S13 based on the position of the face frame and the position of the human body mask, a matching relationship between the face in the face frame and the human body in the human body mask is obtained.
  • the matching relationship between the face and the human body means that the face and the human body belong to the same person, and in the same image, the position of the human body and the position of the human face have a strong correlation. Most likely belong to the same person. Therefore, after the face frame and the human body mask are determined, the positions of the face frame and the human body mask are determined. Then, the person in the face frame can be determined based on the position of the face frame and the position of the human body mask. The matching relationship between the face and the body in the body mask.
  • the face and the human face in the face frame are obtained.
  • the matching relationship between the bodies in the body mask compared with the face and human body matching performed by the face frame and the human body frame, since the human body mask can accurately reflect the position of the human body, the face and human body matching based on the face frame and the human body mask can accurately Get the matching relationship between the face and the human body.
  • FIG. 2 is a schematic diagram of a system architecture to which the method for matching a face and a human body according to an embodiment of the present disclosure can be applied; as shown in FIG.
  • the target image acquisition terminal 201 and the control terminal 203 establish a communication connection through the network 202, and the target image acquisition terminal 201 reports at least one face frame and at least one human body in the target image to the control terminal 203 through the network 202.
  • the control terminal 203 determines the position of the at least one face frame and the position of the at least one human mask in response to the at least one face frame and the at least one human mask; based on the position of the at least one face frame and the at least one human mask The position of the membrane determines the matching relationship between the face in the face frame and the human body in the human mask. Finally, the control terminal 203 uploads the matching relationship to the network 202 and sends it to the target image acquisition terminal 201 through the network 202 . Therefore, face and human body matching can be performed based on the face frame and the human body mask, and the matching relationship between the human face and the human body can be accurately obtained.
  • the target image acquisition terminal 201 may include an image acquisition device, and the control terminal 203 may include a visual processing device or a remote server with visual information processing capability.
  • Network 202 may employ wired or wireless connections.
  • the control terminal 203 is a visual processing device
  • the target image acquisition terminal 201 can be connected to the visual processing device through a wired connection, such as data communication through a bus; when the control terminal 203 is a remote server, The target image acquisition terminal 201 can perform data interaction with a remote server through a wireless network.
  • the target image acquisition terminal 201 may be a vision processing device with a video capture module, or a host with a camera.
  • the display method in the augmented reality scene according to the embodiment of the present disclosure may be executed by the target image acquisition terminal 201 , and the above-mentioned system architecture may not include the network 202 and the control terminal 203 .
  • the determining at least one human body mask in the target image includes: determining at least one human body frame in the target image, and a human body mask in the at least one human body frame ; In the case where more than one human body mask is included in a single human body frame in the at least one human body frame, determine the target human body mask in the single human body frame; Delete the target human body mask in the single human body frame Other body masks.
  • the human body frame is an area in the image containing the human body.
  • the human body frame is generally a rectangular frame, and the vertices of the rectangle can be used to represent the detailed position of the rectangular frame (upper left corner, lower left corner, upper right corner and lower right corner).
  • the human body frame is the positioning result of tracking and positioning the human body.
  • a sliding window can be used to detect the human body frame in the target image, and the human body features can be detected in the sliding window.
  • the window of the feature can be determined as the body frame.
  • a single body frame should contain one body mask, so in the case where a single body frame contains more than one body mask, the target body mask in the single body frame can be determined, and then the target body mask outside the single body frame can be deleted. In this way, the obtained human body frame contains only one human body mask.
  • the target human body mask may be a mask of a human body with high image quality. If a human body frame contains multiple human body masks at the same time, it indicates that the two human bodies cannot be distinguished during the human body detection process. This indicates that there may be a human body with lower image quality in the human body frame. Therefore, by removing the mask of the human body with lower image quality, the matching accuracy of the face and the human body can be improved, and the image quality of the matched human body can be improved. High to meet the subsequent use requirements for matching results.
  • the determining at least one human body frame in the target image includes: when the target image includes multiple human body frames, determining a confidence level of the multiple human body frames the highest first human body frame; determine the degree of overlap between the first human body frame and each second human body frame, and the second human body frame is a human body frame other than the first human body frame in the plurality of human body frames; delete the The second human body frame whose overlap degree is greater than the overlap degree threshold is the second human body frame.
  • the confidence level of the human body features in the window will be obtained, and the window with the confidence level higher than the confidence level threshold is determined as the human body frame. Therefore, there may be multiple human body frames for the same person. In the case of the frame, then the one body frame with the highest confidence in the multiple body frames of the same person in the frame can be reserved. For the convenience of description, the following body frame with the highest confidence is described as the first body frame.
  • the human body box is described as the second human body box.
  • the degree of overlap between the human body frames can be measured by the degree of overlap, and then the second human body frames whose overlap degree is greater than the overlap degree threshold in the second human body frame are deleted.
  • the degree of overlap here may be, for example, the area of the overlapping portion of the two human body frames divided by the sum of the areas of the two human body frames, or may be a value of the area of the overlapping portion.
  • the overlap degree of the human body frame may also be measured by other standards, which is not limited in the embodiment of the present disclosure.
  • the first human body frame with the highest confidence in the multiple human body frames is determined, and then the degree of overlap between the first human body frame and each second human body frame is determined. , delete the second human body frame whose overlap degree is greater than the overlap degree threshold. Therefore, by removing the repeated human body frame, a human body frame with a higher degree of confidence is obtained, the accuracy of the obtained human body frame is improved, and the accuracy of matching the face and the human body is improved.
  • the determining the target human body masks in the single human body frame includes: determining two first human body masks with the largest areas in the single human body frame; When the difference value of the areas of the masks is greater than the set threshold, the first human body mask with the larger area among the two first human body masks is used as the target human body mask.
  • the area of the multiple human body masks can be determined, and then the two people with the largest area are selected.
  • the two body masks with the largest area are referred to as the first body masks here.
  • the areas of the two first body masks are quite different, it means that the quality of the body masks with the largest area is much better than that of other body masks, and if the areas of the two first body masks are not much different , and the two cannot be distinguished in the process of human detection, indicating that the quality of the two first human masks is not very good.
  • the difference value between the areas of the two first human body masks is greater than the set threshold, the first human body mask with a larger area can be used as the target human body mask, and the difference value here is used to reflect the two
  • the difference value may be, for example, the difference between the two first human body masks, or the ratio of the two first human body masks.
  • the set threshold value is a threshold value set in advance, and the threshold value can be set based on experience. For example, in the case that the difference value is the ratio of the two first human body masks, the difference value can be 0.6.
  • the first human body mask with the largest area is A body mask is used as the target body mask.
  • the human body mask with the best image quality in a single human body frame can be quickly acquired, the matching accuracy between the human face and the human body can be improved, and the image quality of the matched human body can be made higher, so as to satisfy the subsequent matching results. usage requirements.
  • the method further includes:
  • the single human body frame and the human body mask in the single human body frame are deleted.
  • the human body mask in the human body frame is deleted, so as to reduce the impact on the matching of the human body and the human face in other human body masks, improve the matching accuracy of the human face and the human body, and improve the image quality of the matched human body. Higher to meet the subsequent use requirements for matching results.
  • the relationship between the face in the face frame and the human body in the human body mask is obtained.
  • the matching relationship includes: obtaining a matching relationship between the face in the face frame and the human body in the human mask based on the distance between the face frame and the top of the human mask.
  • the face and the human body belong to the same person, the face and the human body have a matching relationship, and in the same image, the face of the same person is often located at the top of the human body. Therefore, it can be based on the face frame and the human body mask.
  • the distance between the tops is obtained to obtain the matching relationship between the face in the face frame and the human body in the human mask.
  • the face in the face frame closest to the top of the human mask is used as the face in the human mask Human body in film to match human face.
  • the top of the human body mask may be determined based on the key points of the human body in the human body mask.
  • the key points of the human body include the head, limbs, waist and other main parts of the human body. Based on the key points of the human body, the top of the human body (that is, the direction of the head) can be determined.
  • the matching relationship between the face in the face frame and the human body in the human mask is obtained, so that the human face can be accurately determined.
  • the matching relationship between the face and the human body is obtained, so that the human face can be accurately determined.
  • the matching relationship between the face in the face frame and the human body in the human mask is obtained based on the distance between the face frame and the top of the human mask
  • the method includes: in the case that the target image contains multiple face frames and multiple human masks, establishing multiple correspondence sets according to different correspondence modes between the respective face frames and the respective human masks, wherein a single The correspondence set includes a set of one-to-one correspondence between each face frame and each human body mask; according to the sum of multiple first distances in a single correspondence set, the matching score of a single correspondence set is determined, and the first The distance is the distance between the face frame with the corresponding relationship and the top of the human mask, and the matching score is negatively correlated with the sum of the first distance; the corresponding relationship in the set of corresponding relationships with the largest matching score is used as the target The matching relationship between each face frame and each human mask in the image.
  • the target image contains multiple face frames and multiple human masks
  • a plurality of corresponding relationship sets are established.
  • the possible matching relationships include: ⁇ a-A, b-B, c-C ⁇ ; ⁇ a-A, b-C, c-B ⁇ ; ⁇ a-B, b-A, c-C ⁇ ; ⁇ a-B, b-C, c-A ⁇ ; ⁇ a-C, b-A, c-B ⁇ ; ⁇ a-C, b-B, c-A ⁇ .
  • a set of correspondences in ⁇ is a correspondence set
  • a single correspondence set includes a set of one-to-one correspondences between each face frame and each human body mask in the target image.
  • finding the face frame closest to its top is regarded as finding a matching face, that is, for a single human mask, the closest face and human body are optimal Face-body matching relationship.
  • finding the face frame closest to its top is regarded as finding a matching face, that is, for a single human mask, the closest face and human body are optimal Face-body matching relationship.
  • the matching score of a single correspondence set can be determined according to the sum of multiple first distances in the single correspondence set.
  • the matching score can be used to represent whether the whole of a single correspondence set is optimal. The larger the matching score is, the smaller the sum of the first distances is, and the smaller the The face-body matching relationship is overall better.
  • the face-human body matching relationship in the corresponding relationship set with the largest matching score is the best overall. Therefore, the corresponding relationship in the corresponding relationship set with the largest matching score can be used as each face frame and each human body mask in the target image. matching relationship between.
  • the corresponding relationship set with the largest matching score may be determined based on the minimum cost maximum flow algorithm.
  • the network is constructed. By taking all face frames in the target image as vertices Xi in the bipartite graph, and all human masks as vertices Yi in the bipartite graph, the source point S and sink point T are established. A directed edge with capacity 1 and cost 0 is connected from S to each Xi, and a directed edge with capacity 1 and cost 0 is connected from each Yi to T. A directed edge with a capacity of 1 and a cost of (-score) is connected from each Xi to each Yj, that is, multiple correspondence sets are constructed, and the value of socre is the sum of each human mask and each face frame.
  • the matching score between the two that is, the closer the position of the face frame is to the top of the human mask, the higher the value of the score.
  • the constructed network is to construct multiple correspondence sets, and the process of finding the correspondence set with the largest matching score is to find the minimum cost and maximum flow of the constructed network.
  • the flow is the number of matches, and all full-flow edges are For a set of feasible solutions, find the inverse of the minimum cost matching score, even if the matching score of each set of feasible solutions is the largest, the corresponding relationship set with the largest matching score can be obtained.
  • the target image when the target image includes multiple face frames and multiple human masks, multiple correspondence sets are established according to different correspondence modes between the face frames and the human masks, Then, according to the sum of multiple first distances in a single correspondence set, the matching score of a single correspondence set is determined, and finally the correspondence in the correspondence set with the largest matching score is used as each face frame and each human mask in the target image. matching relationship between membranes. Therefore, the obtained face-human body matching relationship in the corresponding relationship set with the largest matching score is the best overall, and the obtained face-human body matching relationship is more accurate as a whole.
  • the method further includes: storing the matching relationship into a matching relationship In the relationship database, the matching relationship database is used to store the matching relationship between the face and the human body; in response to a query request for the identity information of the target human body, the target human body is searched in the matching relationship database; when the target human body is found In the case of a human body, determine a face that has a matching relationship with the target human body; determine the identity information of the target human body according to the human face.
  • the face and human body matching method provided by the embodiments of the present disclosure has high application value in many fields. Based on face recognition, the identity information of the face can be obtained. Then, after obtaining the matching relationship between the face and the human body, In the case where there is only a human body in the image collected by the camera, a face that has a matching relationship with the human body can be found based on the human body, and then the identity information is determined based on the matched face, that is, the human body in the collected image is determined. identity information.
  • the method for matching a face and a human body provided by the embodiments of the present disclosure is described, and the implementation includes the following steps:
  • Step S21 input the target image, and detect the face and the human body in the target image based on the instance segmentation technology to obtain the face frame, the human body frame, and the human body mask in the human body frame.
  • Step S22 in the case that the single human body frame contains more than one human body mask, determine the two first human body masks with the largest area in the single human body frame, in the case where the ratio of the areas of the two first human body masks is greater than 0.6 Next, keep the first human body mask with a large area, and delete other human body masks.
  • the ratio of the areas of the two first human body masks is less than 0.6, the human body frame and the human body mask in the human body frame are deleted.
  • the position frame information of the face and the human body can be obtained through the face and human body detection model, and the mask information, such as: image, area, mask circumscribed rectangle, is extracted from the human body in the human body frame through the human body instance segmentation model. box and confidence, etc.
  • the mask information such as: image, area, mask circumscribed rectangle
  • box and confidence etc.
  • a non-maximum suppression algorithm can be used to filter the human mask. That is, in the form of iteration, the mask frame with the maximum confidence is continuously used to perform the overlap operation with other frames, so as to filter those mask frames with a large overlap, and finally obtain the retained mask frame, that is, the retained target image. at least one of the body masks.
  • filtering of the non-primary body frame is also required. Since the matching method in the embodiment of the present disclosure uses a human body mask, that is, one human body corresponds to one mask, when multiple human masks appear in one human body frame, in order not to affect the matching between the human face and the human body in subsequent applications , partial information retrieval, extraction of overall attributes, and clustering of faces and human bodies under multi-cameras, non-main human bodies must be filtered.
  • the method of comparing the areas of the two largest human body masks is adopted. If the ratio of the two largest areas is greater than the set threshold, the human body frame needs to be filtered, otherwise only the human body mask information of the largest area is retained.
  • Step S23 in the case that the target image contains multiple face frames and multiple human body masks, according to the different correspondence modes between the face frames and the human body masks, establish multiple correspondence sets, wherein a single correspondence
  • the relationship set includes a one-to-one correspondence between each face frame and each human body mask.
  • a set of at least one human face frame and a set of masked human body frames in the target image are obtained. box and mask to match.
  • MinCostMaxFlow In the process of matching, in order to match more and more accurate results, the maximum flow in the network flow is used to solve the problem of bipartite graph matching. Therefore, the MinCostMaxFlow algorithm is used. "More” is guaranteed with maximum flow, and “more accurate” is guaranteed with minimum consumption.
  • Step S24 according to the sum of multiple first distances in the single correspondence set, determine the matching score of the single correspondence set, the first distance is the distance between the face frame with the corresponding relationship and the top of the human mask, and the matching score is Negatively correlated with the sum of the first distances.
  • step S25 the corresponding relationship in the corresponding relationship set with the largest matching score is used as the matching relationship between each face frame and each human body mask in the target image.
  • FIG. 3 is a schematic flow chart of a method for matching a face and a human body according to an embodiment of the present disclosure.
  • 301 is a face frame and a masked human frame in a detected target image
  • 302 is a pair of redundant
  • the redundant human frame and a redundant mask of a human frame are filtered, and it is determined that the target image contains multiple face frames and multiple human masks.
  • 303 performs matching, and matches multiple face frames and multiple masks in the obtained target image, so as to determine the matching relationship between the face and the human body.
  • execute 304 to output the matching relationship.
  • 4 is a schematic diagram of the effect of a method for matching a face and a human body according to an embodiment of the present disclosure
  • 401 is an actual human body image captured in a shooting scene
  • 402 is a matching effect diagram of a matching algorithm using a face frame and a human body frame
  • 403 is Using the matching effect diagram of the face and human body matching method of the embodiment of the present disclosure, it can be seen that the matching effect of the face and human body matching method provided by the embodiment of the present disclosure is better than the matching effect of the matching algorithm using the face frame and the human body frame.
  • FIG. 5 is a schematic diagram of an actual scene of a method for matching a face and a human body according to an embodiment of the present disclosure
  • 501, 504 and 505 are at least one human body frame in the target image
  • 502 is the face frame in the target image
  • 503 is the target image
  • the matching relationship between the human face and the human body can be obtained by performing matching according to a method for matching a human face and a human body according to an embodiment of the present disclosure.
  • the face and the human face in the face frame are obtained.
  • the matching relationship between the bodies in the body mask compared with the face and human body matching performed by the face frame and the human body frame, since the human body mask can accurately reflect the position of the human body, the face and human body matching based on the face frame and the human body mask can accurately Get the matching relationship between the face and the human body.
  • the embodiments of the present disclosure also provide a face and human body matching apparatus, electronic device, computer-readable storage medium, and program, all of which can be used to implement any of the face and human body matching methods provided in the present disclosure, and corresponding technical solutions and description and see the corresponding documentation in the Methods section.
  • FIG. 6 shows a block diagram of an apparatus for matching a human face and a human body according to an embodiment of the present disclosure.
  • the apparatus 60 includes:
  • a face frame determination unit 61 configured to determine at least one face frame in the target image
  • a human body mask determination unit 62 configured to determine at least one human body mask in the target image
  • the matching relationship determining unit 63 is configured to obtain a matching relationship between the face in the face frame and the human body in the human mask based on the position of the face frame and the position of the human body mask.
  • the human body mask determining unit 62 includes:
  • a human body frame determination unit configured to determine at least one human body frame in the target image and a human body mask in the at least one human body frame
  • a target human body mask determination unit configured to determine a target human body mask in the single human body frame when a single human body frame in the at least one human body frame contains more than one human body mask
  • the human body mask deletion unit is configured to delete other human body masks other than the target human body mask in the single human body frame.
  • the target human body mask determination unit includes:
  • a first human body mask determination subunit configured to determine two first human body masks with the largest area in a single human body frame
  • the target human body mask determination subunit is configured to, when the difference value of the areas of the two first human body masks is greater than the set threshold, determine the first human body with the larger area among the two first human body masks The mask serves as the target human mask.
  • the apparatus further includes:
  • the human body frame deletion unit is configured to delete the single human body frame and the human body mask in the single human body frame when the difference value of the areas of the two first human body masks is not greater than the set threshold.
  • the human body frame determination unit includes:
  • a first human body frame determination unit configured to determine a first human body frame with the highest confidence in the plurality of human body frames when the target image includes multiple human body frames
  • an overlapping degree determination unit configured to determine the degree of overlap between the first human body frame and each second human body frame, and the second human body frame is a human body frame other than the first human body frame in the plurality of human body frames;
  • a second human body frame deletion unit configured to delete the second human body frame whose overlap degree is greater than the overlap degree threshold in the second human body frame
  • At least one human body frame determination unit is configured to determine the deleted second human body frame and the first human body frame as the at least one human body frame.
  • the matching relationship determining unit 63 is configured to obtain, based on the distance between the face frame and the top of the human mask, the face in the face frame and the face in the human mask matching relationship between the human body.
  • the matching relationship determining unit 63 includes:
  • the corresponding relationship set establishment unit is configured to, when the target image includes multiple face frames and multiple human body masks, establish multiple correspondences according to different correspondence modes between the respective face frames and the respective human body masks A relationship set, wherein a single corresponding relationship set contains a set of one-to-one correspondence between each face frame and each human body mask;
  • the matching score determination unit is configured to determine the matching score of the single corresponding relationship set according to the sum of multiple first distances in the single corresponding relationship set, where the first distance is the difference between the face frame with the corresponding relationship and the top of the human mask. The distance between, the matching score is negatively correlated with the sum of the first distance;
  • the matching relationship determining subunit is configured to take the corresponding relationship in the corresponding relationship set with the largest matching score as the matching relationship between each face frame and each human body mask in the target image.
  • the apparatus further includes:
  • a storage unit configured to store the matching relationship in a matching relationship library, where the matching relationship library is used to store the matching relationship between a human face and a human body;
  • a search unit configured to search for the target human body in the matching relationship database in response to a query request for the identity information of the target human body
  • a face determination unit configured to determine a face that has a matching relationship with the target human body when the target human body is found
  • the identity information determination unit is configured to determine the identity information of the target human body according to the face.
  • the functions or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments, and the implementation and technical effects may refer to the descriptions in the above method embodiments.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, which stores computer program instructions, and when the computer program instructions are executed by a processor, realizes the above-mentioned method for matching a human face and a human body.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein, the processor is configured to call the instructions stored in the memory to execute the above-mentioned face and Human body matching method.
  • Embodiments of the present disclosure also provide a computer program product, including computer-readable codes.
  • a processor in the device executes the process for implementing the face provided by any of the above embodiments. Instructions for matching methods with the human body.
  • Embodiments of the present disclosure further provide another computer program product for storing computer-readable instructions, which, when executed, cause the computer to perform operations of the face and human body matching method provided by any of the foregoing embodiments.
  • the electronic device may be provided as a terminal, server or other form of device.
  • FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
  • electronic device 800 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, etc. terminal.
  • an electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814 , and the communication component 816 .
  • the processing component 802 generally controls the overall operation of the electronic device 800, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 802 can include one or more processors 820 to execute instructions to perform all or some of the steps of the methods described above.
  • processing component 802 may include one or more modules that facilitate interaction between processing component 802 and other components.
  • processing component 802 may include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.
  • Memory 804 is configured to store various types of data to support operation at electronic device 800 . Examples of such data include instructions for any application or method operating on electronic device 800, contact data, phonebook data, messages, pictures, videos, and the like. Memory 804 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic or Optical Disk Magnetic Disk
  • Power supply assembly 806 provides power to various components of electronic device 800 .
  • Power supply components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 800 .
  • Multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, swipe, and gestures on the touch panel. The touch sensor may not only sense the boundaries of a touch or swipe action, but also detect the duration and pressure associated with the touch or swipe action.
  • the multimedia component 808 includes a front-facing camera and/or a rear-facing camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each of the front and rear cameras can be a fixed optical lens system or have focal length and optical zoom capability.
  • Audio component 810 is configured to output and/or input audio signals.
  • audio component 810 includes a microphone (MIC) that is configured to receive external audio signals when electronic device 800 is in operating modes, such as call mode, recording mode, and voice recognition mode.
  • the received audio signal may be stored in memory 804 or transmitted via communication component 816 .
  • audio component 810 also includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to: home button, volume buttons, start button, and lock button.
  • Sensor assembly 814 includes one or more sensors for providing status assessment of various aspects of electronic device 800 .
  • the sensor assembly 814 can detect the on/off state of the electronic device 800, the relative positioning of the components, such as the display and the keypad of the electronic device 800, the sensor assembly 814 can also detect the electronic device 800 or one of the electronic device 800 Changes in the position of components, presence or absence of user contact with the electronic device 800 , orientation or acceleration/deceleration of the electronic device 800 and changes in the temperature of the electronic device 800 .
  • Sensor assembly 814 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact.
  • Sensor assembly 814 may also include a light sensor, such as a complementary metal oxide semiconductor (CMOS) or charge coupled device (CCD) image sensor, for use in imaging applications.
  • CMOS complementary metal oxide semiconductor
  • CCD charge coupled device
  • the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 816 is configured to facilitate wired or wireless communication between electronic device 800 and other devices.
  • the electronic device 800 may access a wireless network based on a communication standard, such as wireless network (WiFi), second generation mobile communication technology (2G) or third generation mobile communication technology (3G), or a combination thereof.
  • the communication component 816 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module may be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • electronic device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A programmed gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A programmed gate array
  • controller microcontroller, microprocessor or other electronic component implementation is used to perform the above method.
  • a non-volatile computer-readable storage medium such as memory 804 comprising computer program instructions executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided.
  • FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server.
  • the electronic device 1900 includes a processing component 1922, which includes one or more processors, and a memory resource, represented by memory 1932, for storing instructions executable by the processing component 1922, such as an application program.
  • An application program stored in memory 1932 may include one or more modules, each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described methods.
  • the electronic device 1900 may also include a power supply assembly 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as a Microsoft server operating system (Windows ServerTM), a graphical user interface based operating system (Mac OS XTM) introduced by Apple, a multi-user multi-process computer operating system (UnixTM). ), Free and Open Source Unix-like Operating System (LinuxTM), Open Source Unix-like Operating System (FreeBSDTM) or similar.
  • a non-volatile computer-readable storage medium such as memory 1932 comprising computer program instructions executable by processing component 1922 of electronic device 1900 to perform the above-described method.
  • Embodiments of the present disclosure may be systems, methods and/or computer program products.
  • the computer program product may include a computer-readable storage medium having computer-readable program instructions loaded thereon for causing a processor to implement various aspects of the embodiments of the present disclosure.
  • a computer-readable storage medium may be a tangible device that can hold and store instructions for use by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer readable storage media include: portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM) or flash memory), static random access memory (SRAM), portable compact disk read only memory (CD-ROM), digital versatile disk (DVD), memory sticks, floppy disks, mechanically coded devices, such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • flash memory static random access memory
  • SRAM static random access memory
  • CD-ROM compact disk read only memory
  • DVD digital versatile disk
  • memory sticks floppy disks
  • mechanically coded devices such as printers with instructions stored thereon Hole cards or raised structures in grooves, and any suitable combination of the above.
  • Computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (eg, light pulses through fiber optic cables), or through electrical wires transmitted electrical signals.
  • the computer readable program instructions described herein may be downloaded to various computing/processing devices from a computer readable storage medium, or to an external computer or external storage device over a network such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer-readable program instructions from a network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in each computing/processing device .
  • the computer program instructions for carrying out the operations of the disclosed embodiments may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or programmed in one or more Source or object code written in any combination of languages, including object-oriented programming languages, such as Smalltalk, C++, etc., and conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server implement.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect).
  • LAN local area network
  • WAN wide area network
  • custom electronic circuits such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs)
  • FPGAs field programmable gate arrays
  • PDAs programmable logic arrays
  • Computer readable program instructions are executed to implement various aspects of the embodiments of the present disclosure.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer or other programmable data processing apparatus to produce a machine that causes the instructions when executed by the processor of the computer or other programmable data processing apparatus , resulting in means for implementing the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • These computer readable program instructions can also be stored in a computer readable storage medium, these instructions cause a computer, programmable data processing apparatus and/or other equipment to operate in a specific manner, so that the computer readable medium on which the instructions are stored includes An article of manufacture comprising instructions for implementing various aspects of the functions/acts specified in one or more blocks of the flowchart and/or block diagrams.
  • Computer readable program instructions can also be loaded onto a computer, other programmable data processing apparatus, or other equipment to cause a series of operational steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process , thereby causing instructions executing on a computer, other programmable data processing apparatus, or other device to implement the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more functions for implementing the specified logical function(s) executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the computer program product can be implemented in hardware, software or a combination thereof.
  • the computer program product is embodied as a computer storage medium, and in another optional embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), etc. .
  • the embodiments of the present disclosure disclose a method, an apparatus, an electronic device, a storage medium and a program for matching a human face with a human body.
  • the method includes: determining at least one human face frame in the target image; determining at least one human body mask in the target image; obtaining the human face frame based on the position of the face frame and the position of the human mask The matching relationship between the human face in the face frame and the human body in the human body mask. In this way, the embodiments of the present disclosure can improve the matching accuracy of the human face and the human body.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本公开实施例提供一种人脸和人体匹配的方法、装置、电子设备、存储介质及程序,所述方法包括:确定目标图像中的至少一个人脸框;确定所述目标图像中的至少一个人体掩膜;基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。如此,本公开实施例可提高人脸和人体匹配的准确率。

Description

人脸和人体匹配的方法、装置、电子设备、存储介质及程序
相关申请的交叉引用
本专利申请要求2021年03月25日提交的中国专利申请号为202110321139.6申请人为深圳市商汤科技有限公司,申请名称为“人脸和人体匹配的方法及装置、电子设备和存储介质”的优先权,该申请文件以引用的方式并入本公开中。
技术领域
本公开实施例涉及计算机技术领域,尤其涉及一种人脸和人体匹配的方法、装置、电子设备、存储介质及程序。
背景技术
通过人的图像信息来确定人员身份的技术日趋成熟,将人脸和人体匹配能够更准确地确定人员身份。人脸和人体匹配过程中,会将检测出来的人脸和人体进行“同一个人”的关联操作。
人脸和人体匹配的应用场景日益广泛,如在智能安防系统中,由于摄像头的数量、布置及图像信息等问题较难做到全部的人脸抓拍,可能某个时间只能抓拍到人体。虽然没有抓拍到清晰的人脸,但仍然可以在人脸-人体关联的数据库中,对抓拍到的人体进行检索,检索到匹配的人体后,进而获取关联的人脸信息,从而确定该人体的身份信息。
人脸-人体关联的数据库中,预先对人脸和人体进行了关联,关联的过程通过人脸和人体匹配的方式来实现,使得人脸和人体匹配的准确率较低。
发明内容
本公开实施例提出了一种人脸和人体匹配的方法、装置、电子设备、存储介质及程序。
本公开实施例提供了一种人脸和人体匹配的方法,所述方法由电子设备执行,所述方法包括:
确定目标图像中的至少一个人脸框;
确定所述目标图像中的至少一个人体掩膜;
基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。如此,基于人脸框和人体掩膜进行人脸人体匹配,能够准确地得到人脸和人体之间的匹配关系。
在一些实施例中,所述确定所述目标图像中的至少一个人体掩膜,包括:
确定所述目标图像中的至少一个人体框,以及所述至少一个人体框中的人体掩膜;
在所述至少一个人体框中的单个人体框中包含不止一个人体掩膜的情况下,确定所述单个人体框中的目标人体掩膜;
删除所述单个人体框中目标人体掩膜以外的其它人体掩膜。如此,在单个人体框中包含不止一个人体掩膜的情况下,可以确定单个人体框中的目标人体掩膜,然后删除单个人体框中目标人体掩膜以外的其它人体掩膜,这样得到的人体框中即只包含一个人体掩膜,能够提高人脸和人体匹配的准确率。
在一些实施例中,所述确定所述单个人体框中的目标人体掩膜,包括:
确定单个人体框中面积最大的两个第一人体掩膜;
在所述两个第一人体掩膜的面积的差异值大于设定阈值的情况下,将所述两个第一人体掩膜中面积大的第一人体掩膜作为目标人体掩膜。如此,能够快速地获取单个人体框中图像质量最好的人体掩膜,能够提高人脸和人体匹配的准确率,并且,使得匹配后的人体的图像质量较高,以满足后续对匹配结果的使用需求。
在一些实施例中,在所述确定单个人体框中面积最大的两个第一人体掩膜后,所述方法还包括:
在所述两个第一人体掩膜面积的差异值不大于所述设定阈值的情况下,删除所述单个人体框及所述单个人体框中的人体掩膜。如此,可以将图像质量差的人体框及人体框中的人体掩膜 删除,以减小对其它人体掩膜中的人体和人脸进行匹配时的影响,提高人脸和人体匹配的准确率,并且,使得匹配后的人体的图像质量较高,以满足后续对匹配结果的使用需求。
在一些实施例中,所述确定所述目标图像中的至少一个人体框,包括:
在所述目标图像中包含多个人体框的情况下,确定所述多个人体框中置信度最高的第一人体框;
确定所述第一人体框与各第二人体框的重叠度,所述第二人体框为所述多个人体框中第一人体框以外的人体框;
删除所述第二人体框中重叠度大于重叠度阈值的第二人体框;
将删除后的第二人体框和所述第一人体框,确定为所述至少一个人体框。如此,通过去除重复人体框,能够得到置信度较高的人体框,提高了得到的人体框的准确度,提高了人脸和人体匹配的准确率。
在一些实施例中,所述基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系,包括:
基于所述人脸框与所述人体掩膜顶端之间的距离,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。如此,基于人脸框与人体掩膜顶端之间的距离,得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系,由此,能够准确地得到人脸和人体的匹配关系。
在一些实施例中,所述基于所述人脸框与所述人体掩膜顶端之间的距离,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系,包括:
在所述目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集;其中,单个对应关系集中包含各人脸框和各人体掩膜之间的一组一一对应关系;
根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值,所述第一距离为具备对应关系的人脸框和人体掩膜顶端之间的距离,所述匹配分值与所述第一距离之和负相关;
将匹配分值最大的对应关系集中的对应关系,作为所述目标图像中所述各人脸框和所述各人体掩膜之间的匹配关系。如此,得到的匹配分值最大的对应关系集中的人脸-人体匹配关系整体最优,得到的人脸人体匹配关系在整体上更加准确。
在一些实施例中,在得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系后,所述方法还包括:
将所述匹配关系存储到匹配关系库中,所述匹配关系库用于存储人脸和人体的匹配关系;
响应于针对目标人体的身份信息查询请求,在所述匹配关系库中查找所述目标人体;
在查找到所述目标人体的情况下,确定与所述目标人体具备匹配关系的人脸;
根据所述人脸确定所述目标人体的身份信息。如此,在得到了人脸和人体的匹配关系后,在摄像头采集到的图像中只有人体的情况下,可以基于该人体,查找与该人体具备匹配关系的人脸,然后基于匹配到的人脸确定身份信息,即确定了采集到的图像中的人体的身份信息。
本公开实施例提供了一种人脸和人体匹配的装置,包括:
人脸框确定单元,配置为确定目标图像中的至少一个人脸框;
人体掩膜确定单元,配置为确定所述目标图像中的至少一个人体掩膜;
匹配关系确定单元,配置为基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
在一些实施例中,所述人体掩膜确定单元,包括:
人体框确定单元,配置为确定所述目标图像中的至少一个人体框,以及所述至少一个人体框中的人体掩膜;
目标人体掩膜确定单元,配置为在所述至少一个人体框中的单个人体框中包含不止一个人体掩膜的情况下,确定所述单个人体框中的目标人体掩膜;
人体掩膜删除单元,配置为删除所述单个人体框中目标人体掩膜以外的其它人体掩膜。
在一些实施例中,所述目标人体掩膜确定单元,包括:
第一人体掩膜确定子单元,配置为确定单个人体框中面积最大的两个第一人体掩膜;
目标人体掩膜确定子单元,配置为在所述两个第一人体掩膜的面积的差异值大于设定阈值 的情况下,将所述两个第一人体掩膜中面积大的第一人体掩膜作为目标人体掩膜。
在一些实施例中,所述装置还包括:
人体框删除单元,配置为在所述两个第一人体掩膜面积的差异值不大于所述设定阈值的情况下,删除所述单个人体框及所述单个人体框中的人体掩膜。
在一些实施例中,所述人体框确定单元,包括:
第一人体框确定单元,配置为在所述目标图像中包含多个人体框的情况下,确定所述多个人体框中置信度最高的第一人体框;
重叠度确定单元,配置为确定所述第一人体框与各第二人体框的重叠度,所述第二人体框为所述多个人体框中第一人体框以外的人体框;
第二人体框删除单元,配置为删除所述第二人体框中重叠度大于重叠度阈值的第二人体框;
至少一个人体框确定单元,配置为将删除后的第二人体框和所述第一人体框,确定为所述至少一个人体框。
在一些实施例中,所述匹配关系确定单元,配置为基于所述人脸框与所述人体掩膜顶端之间的距离,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
在一些实施例中,所述匹配关系确定单元,包括:
对应关系集建立单元,配置为在所述目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集,其中,单个对应关系集中包含各人脸框和各人体掩膜之间的一组一一对应关系;
匹配分值确定单元,配置为根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值;其中,所述第一距离为具备对应关系的人脸框和人体掩膜顶端之间的距离,所述匹配分值与所述第一距离之和负相关;
匹配关系确定子单元,配置为将匹配分值最大的对应关系集中的对应关系,作为所述目标图像中所述各人脸框和所述各人体掩膜之间的匹配关系。
在一些实施例中,所述装置还包括:
存储单元,配置为将所述匹配关系存储到匹配关系库中,所述匹配关系库用于存储人脸和人体的匹配关系;
查找单元,配置为响应于针对目标人体的身份信息查询请求,在所述匹配关系库中查找所述目标人体;
人脸确定单元,配置为在查找到所述目标人体的情况下,确定与所述目标人体具备匹配关系的人脸;
身份信息确定单元,配置为根据所述人脸,确定所述目标人体的身份信息。
本公开实施例提供了一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述方法。
本公开实施例提供了一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述方法。
在本公开实施例中,通过确定目标图像中的至少一个人脸框,以及至少一个人体掩膜,然后基于人脸框的位置和人体掩膜的位置,来得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系。由此,相对于通过人脸框和人体框进行人脸人体匹配而言,由于人体掩膜能够准确地反映人体的位置,因此基于人脸框和人体掩膜进行人脸人体匹配,能够准确地得到人脸和人体之间的匹配关系。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开实施例。根据下面参考附图对示例性实施例的详细说明,本公开实施例的其它特征及方面将变得清楚。
附图说明
为了更清楚地说明本公开实施例或背景技术中的技术方案,下面将对本公开实施例或背景技术中所需要使用的附图进行说明。
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开实施例的技术方案。
图1示出根据本公开实施例的人脸和人体匹配方法的流程图。
图2示出根据本公开实施例的人脸和人体匹配方法的系统架构示意图;
图3示出根据本公开实施例的人脸和人体匹配方法的方案流程示意图;
图4示出根据本公开实施例的人脸和人体匹配方法的效果示意图;
图5示出根据本公开实施例的人脸和人体匹配方法的实际场景示意图;
图6示出根据本公开实施例的一种人脸和人体匹配装置的框图;
图7示出根据本公开实施例的一种电子设备的框图;
图8示出根据本公开实施例的一种电子设备的框图。
具体实施方式
以下将参考附图详细说明本公开的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其它实施例。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
另外,为了更好地说明本公开实施例,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本公开实施例同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本公开实施例的主旨。
人脸和人体匹配可以基于同一图像中的人物来实现,在同一图像中的人物既包含人脸也包含人体的情况下,则可以方便快速地实现人脸和人体的匹配。在相关技术中,往往通过人脸检测和人体检测得到人脸框和人体框,然后通过人脸框和人体框来进行人脸人体匹配,但是人体框往往无法准确反映人体的位置,或者人体框中可能会存在多个人体,因此,在比较复杂的场景下得到匹配结果准确度较低。
在本公开实施例中,通过确定目标图像中的至少一个人脸框,以及至少一个人体掩膜,然后基于人脸框的位置和人体掩膜的位置,来得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系。由此,相对于通过人脸框和人体框进行人脸人体匹配而言,由于人体掩膜能够准确地反映人体的位置,因此基于人脸框和人体掩膜进行人脸人体匹配,能够准确地得到人脸和人体之间的匹配关系。
本公开实施例提供的人脸和人体匹配方法在很多领域都具备较高的应用价值,例如,在检测目标对象的过程中,在人群密度较高的场景下,或者在人脸被遮挡的情况下,通过拍摄到的目标对象的人体,以及预先建立的人脸-人体之间的匹配关系,即可得到目标对象的人脸,进而依据人脸确定目标对象的身份。
在一种可能的实现方式中,所述人脸和人体匹配方法可以由终端设备或服务器等电子设备执行,终端设备可以为用户设备(User Equipment,UE)、移动设备、用户终端、终端、蜂窝电话、无绳电话、个人数字助理(Personal Digital Assistant,PDA)、手持设备、计算设备、车载设备、可穿戴设备等,所述方法可以通过处理器调用存储器中存储的计算机可读指令的方式来实现。或者,可通过服务器执行所述方法。
为便于描述,本说明书一个或多个实施例中,人脸和人体匹配方法的执行主体可以是人脸和人体匹配设备,后文以执行主体为人脸和人体匹配设备为例,对该方法的实施方式进行介绍。可以理解,该方法的执行主体为人脸和人体匹配设备只是一种示例性的说明,并不应理解为对该方法的限定。
图1示出根据本公开实施例的人脸和人体匹配方法的流程图,如图1所示,所述人脸和人体匹配方法包括:
在步骤S11中,确定目标图像中的至少一个人脸框。
人脸框是包含人脸的图像中的区域,人脸框一般为矩形框,可以用矩形的顶点表示(左上角、左下角、右上角和右下角)矩形框的详细位置。
人脸框是对人脸进行跟踪定位的定位结果,对人脸进行定位的方式可以有多种,例如,可以通过滑动窗口的方式来确定目标图像中的人脸框,在滑动窗口内检测人脸特征,检测到人脸特征的窗口即可确定为人脸框。人脸特征例如可以是人脸关键点,人脸关键点是人脸上的关键点,例如,眼睛(如眼角、眼球中心、眼尾)、鼻子(如鼻尖、鼻翼)、嘴巴(如嘴唇、唇角、唇边)、下巴、眉角等关键点,基于对这些关键点的检测,即可定位出人脸框。
在一些实施例中,确定出的目标图像中的人脸框可以有一个,也可以有多个。
在步骤S12中,确定所述目标图像中的至少一个人体掩膜。
人体掩膜用来指示人体的轮廓所在的区域,人体掩膜可以基于实例分割得到,实例分割是在语义分割的基础上,区分同一类物体的不同个体,因此,确定出的人体掩膜对应单个的人体,即一个人体掩膜对应一个人体。
实例分割可以基于目标检测技术和语义分割技术实现,例如,首先在目标图像中将人体检测出来,然后对每个人体对应的像素打上标签,对于图像中每个人体都可以进行区分和标注。例如,通过目标检测技术,能够得到表征人体所在位置的人体框,在人体框的基础之上,再分割出人体框中的各人体的轮廓区域,即得到了人体掩膜。
在步骤S13中,基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
人脸和人体具备匹配关系,则表征人脸和人体属于同一人,而在同一张图像中,人体的位置和人脸的位置具备较强的相关性,一般情况下,距离人体最近的人脸大概率是属于同一人的。因此,在确定出人脸框和人体掩膜之后,即确定了人脸框和人体掩膜的位置,那么,可以基于人脸框的位置和人体掩膜的位置,确定人脸框中的人脸和人体掩膜中的人体之间的匹配关系。
在本公开实施例中,通过确定目标图像中的至少一个人脸框,以及至少一个人体掩膜,然后基于人脸框的位置和人体掩膜的位置,来得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系。由此,相对于通过人脸框和人体框进行人脸人体匹配而言,由于人体掩膜能够准确地反映人体的位置,因此基于人脸框和人体掩膜进行人脸人体匹配,能够准确地得到人脸和人体之间的匹配关系。
图2为可以应用本公开实施例的人脸和人体匹配的方法的系统架构示意图;如图2所示,该系统架构中包括:目标图像采集设备201、网络202、控制终端203。为实现支撑一个示例性应用,目标图像获取终端201和控制终端203通过网络202建立通信连接,目标图像获取终端201通过网络202向控制终端203上报目标图像中的至少一个人脸框和至少一个人体掩膜,控制终端203响应于至少一个人脸框和至少一个人体掩膜,确定至少一个人脸框的位置和至少一个人体掩膜的位置;基于至少一个人脸框的位置和至少一个人体掩膜的位置,确定出人脸框中的人脸和人体掩膜中的人体之间的匹配关系。最后,控制终端203将匹配关系上传至网络202,并通过网络202发送给目标图像获取终端201。从而可以基于人脸框和人体掩膜进行人脸人体匹配,能够准确地得到人脸和人体之间的匹配关系。
作为示例,目标图像获取终端201可以包括图像采集设备,控制终端203可以包括具有视觉信息处理能力的视觉处理设备或远程服务器。网络202可以采用有线或无线连接方式。其中,在控制终端203为视觉处理设备的情况下,目标图像获取终端201可以通过有线连接的方式与视觉处理设备通信连接,例如通过总线进行数据通信;在控制终端203为远程服务器的情况下,目标图像获取终端201可以通过无线网络与远程服务器进行数据交互。
或者,在一些场景中,目标图像获取终端201可以是带有视频采集模组的视觉处理设备,可以是带有摄像头的主机。这时,本公开实施例的增强现实场景下的展示方法可以由目标图像获取终端201执行,上述系统架构可以不包含网络202和控制终端203。
在一种可能的实现方式中,所述确定所述目标图像中的至少一个人体掩膜,包括:确定所述目标图像中的至少一个人体框,以及所述至少一个人体框中的人体掩膜;在所述至少一个人体框中的单个人体框中包含不止一个人体掩膜的情况下,确定所述单个人体框中的目标人体掩膜;删除所述单个人体框中目标人体掩膜以外的其它人体掩膜。
人体框是包含人体的图像中的区域,人体框一般为矩形框,可以用矩形的顶点表示(左上 角、左下角、右上角和右下角)矩形框的详细位置。
人体框是对人体进行跟踪定位的定位结果,对人体进行定位的方式可以有多种,例如可以通过滑动窗口的方式来检测目标图像中的人体框,在滑动窗口内检测人体特征,检测到人体特征的窗口即可确定为人体框。
单个人体框应该包含一个人体掩膜,因此,在单个人体框中包含不止一个人体掩膜的情况下,可以确定单个人体框中的目标人体掩膜,然后删除单个人体框中目标人体掩膜以外的其它人体掩膜,这样得到的人体框中即只包含一个人体掩膜。
在本公开实施例中,目标人体掩膜可以是图像质量较高的人体的掩膜,如果一个人体框中同时包含多个人体掩膜,表明人体检测过程中无法将这两个人体区分开,这表明人体框中可能存在图像质量较低的人体,因此,通过去除图像质量较低的人体的掩膜,能够提高人脸和人体匹配的准确率,并且,使得匹配后的人体的图像质量较高,以满足后续对匹配结果的使用需求。
针对目标图像中的同一人物,可能会存在多个人体框将其框中的情况,针对这种情况,可以对同一人物的多个人体框进行去重,只保留一个人体框。在一种可能的实现方式中,所述确定所述目标图像中的至少一个人体框,包括:在所述目标图像中包含多个人体框的情况下,确定所述多个人体框中置信度最高的第一人体框;确定所述第一人体框与各第二人体框的重叠度,所述第二人体框为所述多个人体框中第一人体框以外的人体框;删除所述第二人体框中重叠度大于重叠度阈值的第二人体框。
在通过滑动窗口检测人体特征的过程中,会得到窗口中包含人体特征的置信度,置信度高于置信度阈值的窗口即确定为人体框,因此,可能会存在同一人物存在多个人体框将其框中的情况,那么,可以保留框中同一人物的多个人体框中置信度最高的一个人体框,为便于描述,后文将置信度最高的人体框描述为第一人体框。
而同一人物的多个人体框往往会存在重叠部分,那么,可以将与置信度最高的人体框存在较多重叠的人体框删除,为便于描述,后文将与置信度最高的人体框存在重叠的人体框描述为第二人体框。可以通过重叠度来衡量人体框之间的重叠程度,然后删除第二人体框中重叠度大于重叠度阈值的第二人体框。
这里的重叠度,例如可以是两个人体框重叠部分的面积除以两个人体框的面积之和,或者,也可以是重叠部分的面积的值。当然,也可以通过其它标准来衡量人体框的重叠度,本公开实施例对此不作限定。
在本公开实施例中,在同一人物对应多个人体框的情况下,通过确定多个人体框中置信度最高的第一人体框,然后确定第一人体框与各第二人体框的重叠度,删除第二人体框中重叠度大于重叠度阈值的第二人体框。由此,通过去除重复人体框,得到置信度较高的人体框,提高了得到的人体框的准确度,提高了人脸和人体匹配的准确率。
在一种可能的实现方式中,所述确定所述单个人体框中的目标人体掩膜,包括:确定单个人体框中面积最大的两个第一人体掩膜;在所述两个第一人体掩膜的面积的差异值大于设定阈值的情况下,将所述两个第一人体掩膜中面积大的第一人体掩膜作为目标人体掩膜。
考虑到面积越大的情况下人体的图像质量越高,因此,在单个人体框中存在多个人体掩膜的情况下,可以确定这多个人体掩膜的面积,然后选取面积最大的两个人体掩膜,为便于后续描述,这里将面积最大的两个人体掩膜称为第一人体掩膜。
而如果两个第一人体掩膜的面积相差较大,则表明面积最大的人体掩膜的质量要远好于其它人体掩膜的质量,而如果两个第一人体掩膜的面积相差不大,且人体检测的过程中无法将二者区分开,则表明两个第一人体掩膜的质量都不太好。
因此,在两个第一人体掩膜的面积之间的差异值大于设定阈值的情况下,可以将面积大的第一人体掩膜作为目标人体掩膜,这里的差异值用于反映两个第一人体掩膜的面积的差异程度,该差异值例如可以是两个第一人体掩膜的差值,或者是两个第一人体掩膜的比值。
设定阈值为提前设定的阈值,该阈值可以基于经验来设定,例如,在差异值为两个第一人体掩膜的比值的情况下,该差异值可以是0.6。
在本公开实施例中,通过确定单个人体框中面积最大的两个第一人体掩膜,在两个第一人体掩膜的面积的差异值大于设定阈值的情况下,将面积大的第一人体掩膜作为目标人体掩膜。 由此,能够快速地获取单个人体框中图像质量最好的人体掩膜,能够提高人脸和人体匹配的准确率,并且,使得匹配后的人体的图像质量较高,以满足后续对匹配结果的使用需求。
在一种可能的实现方式中,在所述确定单个人体框中面积最大的两个第一人体掩膜后,所述方法还包括:
在所述两个第一人体掩膜面积的差异值不大于设定阈值的情况下,删除所述单个人体框及所述单个人体框中的人体掩膜。
如果两个第一人体掩膜的面积相差不大,且人体检测的过程中无法将二者区分开,则表明两个第一人体掩膜的质量都不太好,因此,可以将该人体框及人体框中的人体掩膜删除,以减小对其它人体掩膜中的人体和人脸进行匹配时的影响,提高人脸和人体匹配的准确率,并且,使得匹配后的人体的图像质量较高,以满足后续对匹配结果的使用需求。
在一种可能的实现方式中,所述基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系,包括:基于所述人脸框与所述人体掩膜顶端之间的距离,得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系。
由于人脸和人体属于同一人的情况下,人脸和人体具备匹配关系,而在同一张图像中,同一人的人脸往往是位于人体顶端的,因此,可以基于人脸框与人体掩膜顶端之间的距离,得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系,例如,将距离人体掩膜顶端最近的人脸框中的人脸,作为与该人体掩膜中的人体匹配的人脸。
在一种可能的实现方式中,可以基于人体掩膜中人体的关键点,来确定人体掩膜的顶端。人体关键点包括人体的头部、四肢、腰部等主要部位,基于人体关键点,能够确定人体的顶端(即头部所在的方向)。
在本公开实施例中,基于人脸框与人体掩膜顶端之间的距离,得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系,由此,能够准确地确定人脸和人体的匹配关系。
在一种可能的实现方式中,所述基于所述人脸框与所述人体掩膜顶端之间的距离,得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系,包括:在所述目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集,其中,单个对应关系集中包含各人脸框和各人体掩膜之间的一组一一对应关系;根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值,所述第一距离为具备对应关系的人脸框和人体掩膜顶端之间的距离,所述匹配分值与所述第一距离之和负相关;将匹配分值最大的对应关系集中的对应关系,作为目标图像中各人脸框和各人体掩膜之间的匹配关系。
在目标图像中包含多个人脸框和多个人体掩膜的情况下,人脸和人体之间的对应方式具备多种可能,那么,可以根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集。例如,目标图像中包含人脸框a、b、c,人体掩膜A、B、C,那么可能的匹配关系包括:{a-A,b-B,c-C};{a-A,b-C,c-B};{a-B,b-A,c-C};{a-B,b-C,c-A};{a-C,b-A,c-B};{a-C,b-B,c-A}。其中,{}中的一组对应关系即为一个对应关系集,单个对应关系集中包含目标图像中的各人脸框和各人体掩膜之间的一组一一对应关系。
对于单个人体掩膜而言,寻找与其顶端距离最近的人脸框,即视为找到了与其匹配的人脸,即对单个人体掩膜而言,距离最近的人脸和人体即为最优的人脸-人体匹配关系。然而,对于单个对应关系集而言,需要使得单个对应关系集这一个整体中的多个人脸-人体的匹配关系最优。
基于使得单个对应关系集中的多个人脸-人体的匹配关系最优的需求,可以根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值,第一距离即为具备对应关系的人脸框和人体掩膜顶端之间的距离。该匹配分值即可用于表征单个对应关系集的整体是否是最优的,匹配分值越大,即表明第一距离之和越小,第一距离之和越小,表明单个对应关系集中的人脸-人体匹配关系整体越优。
那么,匹配分值最大的对应关系集中的人脸-人体匹配关系整体最优,因此,可以将匹配分值最大的对应关系集中的对应关系,作为目标图像中各人脸框和各人体掩膜之间的匹配关系。
在一种可能的实现方式中,可以基于最小费用最大流算法,来确定匹配分值最大的对应关 系集。首先构建网络,通过把目标图像中的所有人脸框作为二分图中顶点Xi,所有人体掩膜作为二分图中顶点Yi,建立源点S和汇点T。从S向每个Xi连一条容量为1,费用为0的有向边,从每个Yi向T连一条容量为1,费用为0的有向边。从每个Xi向每个Yj连接一条容量为1,费用为(-score)的有向边,即构建了多个对应关系集,socre的值为每一个人体掩膜和每一个人脸框之间的匹配分数,即人脸框位置越接近人体掩膜的顶端,score的值越高。
构建的该网络即为构建了多个对应关系集,而求解匹配分值最大的对应关系集的过程,即为求构建的网络的最小费用最大流,流量即为匹配数,所有满流边是一组可行解,求最小费用匹配分数的相反数,即使得每组可行解的匹配分数最大,即可得到匹配分值最大的对应关系集。
在本公开实施例中,在目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集,然后根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值,最终将匹配分值最大的对应关系集中的对应关系,作为目标图像中各人脸框和各人体掩膜之间的匹配关系。由此,得到的匹配分值最大的对应关系集中的人脸-人体匹配关系整体最优,得到的人脸人体匹配关系在整体上更加准确。
在一种可能的实现方式中,在得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系后,所述方法还包括:将所述匹配关系存储到匹配关系库中,所述匹配关系库用于存储人脸和人体的匹配关系;响应于针对目标人体的身份信息查询请求,在所述匹配关系库中查找所述目标人体;在查找到所述目标人体的情况下,确定与所述目标人体具备匹配关系的人脸;根据所述人脸确定所述目标人体的身份信息。
本公开实施例提供的人脸和人体匹配方法在很多领域都具备较高的应用价值,基于人脸识别,能够得到人脸的身份信息,那么,在得到了人脸和人体的匹配关系后,在摄像头采集到的图像中只有人体的情况下,可以基于该人体,查找与该人体具备匹配关系的人脸,然后基于匹配到的人脸确定身份信息,即确定了采集到的图像中的人体的身份信息。
在一些可实现的实施方式中,对本公开实施例提供的人脸和人体匹配方法进行说明,该实施方式包括如下步骤:
步骤S21,输入目标图像,基于实例分割技术,检测目标图像中的人脸和人体,得到人脸框和人体框,以及人体框中的人体掩膜。
步骤S22,在单个人体框中包含不止一个人体掩膜的情况下,确定单个人体框中面积最大的两个第一人体掩膜,在两个第一人体掩膜的面积的比值大于0.6的情况下,将面积大的第一人体掩膜保留,删除其它人体掩膜。
在两个第一人体掩膜面积的比值小于0.6的情况下,删除该人体框及该人体框中的人体掩膜。
在一些实施例中,通过人脸人体检测模型可得到人脸和人体的位置框信息,通过人体实例分割模型来对人体框中的人体提取掩膜信息,如:图像、面积、掩膜外接矩形框和置信度等。为了对生成的人体掩膜候选框进行处理,去除冗余的候选框,得到最佳掩膜信息,可选用非极大值抑制算法对人体掩膜进行过滤。即通过迭代的形式,不断的以最大置信度的掩膜框去与其他框做重叠度操作,从而过滤那些重叠度较大的掩膜框,最终得到保留的掩膜框,即保留的目标图中的至少一个人体掩膜。
在一些实施例中,还需要进行非主要人体框的过滤。由于本公开实施例的匹配方法使用的是人体掩膜,即一个人体对应一条掩膜,因此当一个人体框中出现多个人体掩膜时,为了不影响在后续应用下人脸与人体的匹配、部分信息检索、整体属性的提取及多摄像头下人脸人体的聚类等需求,必须对非主要人体进行过滤。这里采用通过比较最大两个人体掩膜的面积的方法,如果两个最大面积的比例大于设定的阈值则需要过滤该人体框,否则只保留最大面积的人体掩膜信息。
步骤S23,在目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集,其中,单个对应关系集中包含各人脸框和各人体掩膜之间的一种一一对应关系。
在一些实施例中,经过对人体框和人体掩膜的过滤操作后,就得到了目标图像中的至少一 个人脸框的集合和带掩膜的人体框的集合,接下来就需要对人脸框和掩膜进行匹配。
在进行匹配的过程中,为了能够匹配更多和更准确的结果,利用网络流中的最大流去解决二分图匹配的问题。因此使用最小费用最大流(MinCostMaxFlow)算法。“更多”用最大流去保证,“更准确”用最小消费去保证。
步骤S24,根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值,第一距离为具备对应关系的人脸框和人体掩膜顶端之间的距离,匹配分值与第一距离之和负相关。
步骤S25,将匹配分值最大的对应关系集中的对应关系,作为目标图像中各人脸框和各人体掩膜之间的匹配关系。
图3为本公开实施例的一种人脸和人体匹配方法的方案流程图,如图3所示,301为输入检测到目标图像中的人脸框和带掩膜的人体框,302对冗余人体框和一个人体框的冗余掩膜进行过滤,确定出目标图像中包含多个人脸框和多个人体掩膜。303进行匹配,对得到的目标图像中的多个人脸框和多个掩膜进行匹配,从而确定出人脸和人体的匹配关系。最后执行304输出匹配关系。
图4为本公开实施例的一种人脸和人体匹配方法的效果示意图,401为拍摄场景下抓拍的实际人体图像,402为采用人脸框和人体框的匹配算法的匹配效果图,403为采用本公开实施例的人脸和人体匹配方法的匹配效果图,可见,本公开实施例提供的人脸和人体匹配方法的匹配效果优于采用人脸框和人体框的匹配算法的匹配效果。
图5为本公开实施例的一种人脸和人体匹配方法的实际场景示意图,501、504和505为目标图像中的至少一个人体框,502为目标图像中的人脸框,503为目标图像中的人体掩膜,根据图5所示可知,根据本公开实施例的一种人脸和人体匹配方法进行匹配,可得到准确的人脸和人体之间的匹配关系。
在本公开实施例中,通过确定目标图像中的至少一个人脸框,以及至少一个人体掩膜,然后基于人脸框的位置和人体掩膜的位置,来得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系。由此,相对于通过人脸框和人体框进行人脸人体匹配而言,由于人体掩膜能够准确地反映人体的位置,因此基于人脸框和人体掩膜进行人脸人体匹配,能够准确地得到人脸和人体之间的匹配关系。
可以理解,本公开实施例提及的上述各个方法实施例,在不违背原理逻辑的情况下,均可以彼此相互结合形成结合后的实施例。本领域技术人员可以理解,在实施方式的上述方法中,各步骤的执行顺序应当以其功能和可能的内在逻辑确定。
此外,本公开实施例还提供了人脸和人体匹配的装置、电子设备、计算机可读存储介质、程序,上述均可用来实现本公开提供的任一种人脸和人体匹配方法,相应技术方案和描述和参见方法部分的相应记载。
图6示出根据本公开实施例的人脸和人体匹配的装置的框图,如图6所示,所述装置60包括:
人脸框确定单元61,配置为确定目标图像中的至少一个人脸框;
人体掩膜确定单元62,配置为确定所述目标图像中的至少一个人体掩膜;
匹配关系确定单元63,配置为基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
在一种可能的实现方式中,所述人体掩膜确定单元62,包括:
人体框确定单元,配置为确定所述目标图像中的至少一个人体框,以及所述至少一个人体框中的人体掩膜;
目标人体掩膜确定单元,配置为在所述至少一个人体框中的单个人体框中包含不止一个人体掩膜的情况下,确定所述单个人体框中的目标人体掩膜;
人体掩膜删除单元,配置为删除所述单个人体框中目标人体掩膜以外的其它人体掩膜。
在一种可能的实现方式中,所述目标人体掩膜确定单元,包括:
第一人体掩膜确定子单元,配置为确定单个人体框中面积最大的两个第一人体掩膜;
目标人体掩膜确定子单元,配置为在所述两个第一人体掩膜的面积的差异值大于设定阈值 的情况下,将所述两个第一人体掩膜中面积大的第一人体掩膜作为目标人体掩膜。
在一种可能的实现方式中,所述装置还包括:
人体框删除单元,配置为在所述两个第一人体掩膜面积的差异值不大于所述设定阈值的情况下,删除所述单个人体框及所述单个人体框中的人体掩膜。
在一种可能的实现方式中,所述人体框确定单元,包括:
第一人体框确定单元,配置为在所述目标图像中包含多个人体框的情况下,确定所述多个人体框中置信度最高的第一人体框;
重叠度确定单元,配置为确定所述第一人体框与各第二人体框的重叠度,所述第二人体框为所述多个人体框中第一人体框以外的人体框;
第二人体框删除单元,配置为删除所述第二人体框中重叠度大于重叠度阈值的第二人体框;
至少一个人体框确定单元,配置为将删除后的第二人体框和所述第一人体框,确定为所述至少一个人体框。
在一种可能的实现方式中,所述匹配关系确定单元63,配置为基于所述人脸框与所述人体掩膜顶端之间的距离,得到人脸框中的人脸和人体掩膜中的人体之间的匹配关系。
在一种可能的实现方式中,所述匹配关系确定单元63,包括:
对应关系集建立单元,配置为在所述目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集,其中,单个对应关系集中包含各人脸框和各人体掩膜之间的一组一一对应关系;
匹配分值确定单元,配置为根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值,所述第一距离为具备对应关系的人脸框和人体掩膜顶端之间的距离,所述匹配分值与所述第一距离之和负相关;
匹配关系确定子单元,配置为将匹配分值最大的对应关系集中的对应关系,作为目标图像中各人脸框和各人体掩膜之间的匹配关系。
在一种可能的实现方式中,所述装置还包括:
存储单元,配置为将所述匹配关系存储到匹配关系库中,所述匹配关系库用于存储人脸和人体的匹配关系;
查找单元,配置为响应于针对目标人体的身份信息查询请求,在所述匹配关系库中查找所述目标人体;
人脸确定单元,配置为在查找到所述目标人体的情况下,确定与所述目标人体具备匹配关系的人脸;
身份信息确定单元,配置为根据所述人脸确定所述目标人体的身份信息。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其实现和技术效果可以参照上文方法实施例的描述。
本公开实施例还提出一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现上述人脸和人体匹配的方法。计算机可读存储介质可以是非易失性计算机可读存储介质。
本公开实施例还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器被配置为调用所述存储器存储的指令,以执行上述人脸和人体匹配的方法。
本公开实施例还提供了一种计算机程序产品,包括计算机可读代码,在计算机可读代码在设备上运行的情况下,设备中的处理器执行用于实现如上任一实施例提供的人脸和人体匹配方法的指令。
本公开实施例还提供了另一种计算机程序产品,用于存储计算机可读指令,指令被执行时使得计算机执行上述任一实施例提供的人脸和人体匹配方法的操作。
电子设备可以被提供为终端、服务器或其它形态的设备。
图7示出根据本公开实施例的一种电子设备800的框图。例如,电子设备800可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等终端。
参照图7,电子设备800可以包括以下一个或多个组件:处理组件802,存储器804,电源 组件806,多媒体组件808,音频组件810,输入/输出(I/O)的接口812,传感器组件814,以及通信组件816。
处理组件802通常控制电子设备800的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件802可以包括一个或多个处理器820来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件802可以包括一个或多个模块,便于处理组件802和其他组件之间的交互。例如,处理组件802可以包括多媒体模块,以方便多媒体组件808和处理组件802之间的交互。
存储器804被配置为存储各种类型的数据以支持在电子设备800的操作。这些数据的示例包括用于在电子设备800上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器804可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件806为电子设备800的各种组件提供电力。电源组件806可以包括电源管理系统,一个或多个电源,及其他与为电子设备800生成、管理和分配电力相关联的组件。
多媒体组件808包括在所述电子设备800和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件808包括一个前置摄像头和/或后置摄像头。在电子设备800处于操作模式,如拍摄模式或视频模式的情况下,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件810被配置为输出和/或输入音频信号。例如,音频组件810包括一个麦克风(MIC),在电子设备800处于操作模式,如呼叫模式、记录模式和语音识别模式的情况下,麦克风被配置为接收外部音频信号。所接收的音频信号可以被存储在存储器804或经由通信组件816发送。在一些实施例中,音频组件810还包括一个扬声器,用于输出音频信号。
I/O接口812为处理组件802和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件814包括一个或多个传感器,用于为电子设备800提供各个方面的状态评估。例如,传感器组件814可以检测到电子设备800的打开/关闭状态,组件的相对定位,例如所述组件为电子设备800的显示器和小键盘,传感器组件814还可以检测电子设备800或电子设备800一个组件的位置改变,用户与电子设备800接触的存在或不存在,电子设备800方位或加速/减速和电子设备800的温度变化。传感器组件814可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件814还可以包括光传感器,如互补金属氧化物半导体(CMOS)或电荷耦合装置(CCD)图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件814还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件816被配置为便于电子设备800和其他设备之间有线或无线方式的通信。电子设备800可以接入基于通信标准的无线网络,如无线网络(WiFi),第二代移动通信技术(2G)或第三代移动通信技术(3G),或它们的组合。在一个示例性实施例中,通信组件816经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件816还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,电子设备800可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述方法。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指 令的存储器804,上述计算机程序指令可由电子设备800的处理器820执行以完成上述方法。
图8示出根据本公开实施例的一种电子设备1900的框图。例如,电子设备1900可以被提供为一服务器。参照图8,电子设备1900包括处理组件1922,其包括一个或多个处理器,以及由存储器1932所代表的存储器资源,用于存储可由处理组件1922的执行的指令,例如应用程序。存储器1932中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1922被配置为执行指令,以执行上述方法。
电子设备1900还可以包括一个电源组件1926被配置为执行电子设备1900的电源管理,一个有线或无线网络接口1950被配置为将电子设备1900连接到网络,和一个输入输出(I/O)接口1958。电子设备1900可以操作基于存储在存储器1932的操作系统,例如微软服务器操作系统(Windows ServerTM),苹果公司推出的基于图形用户界面操作系统(Mac OS XTM),多用户多进程的计算机操作系统(UnixTM),自由和开放原代码的类Unix操作系统(LinuxTM),开放原代码的类Unix操作系统(FreeBSDTM)或类似。
在示例性实施例中,还提供了一种非易失性计算机可读存储介质,例如包括计算机程序指令的存储器1932,上述计算机程序指令可由电子设备1900的处理组件1922执行以完成上述方法。
本公开实施例可以是系统、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本公开实施例的各个方面的计算机可读程序指令。
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是(但不限于)电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。
用于执行本公开实施例操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,所述编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本公开实施例的各个方面。
这里参照根据本公开实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本公开实施例的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数 据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。
附图中的流程图和框图显示了根据本公开的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,所述模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
该计算机程序产品可以通过硬件、软件或其结合的方式实现。在一个可选实施例中,所述计算机程序产品具体体现为计算机存储介质,在另一个可选实施例中,计算机程序产品体现为软件产品,例如软件开发包(Software Development Kit,SDK)等等。
以上已经描述了本公开的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。
工业实用性
本公开实施例公开了一种人脸和人体匹配的方法、装置、电子设备、存储介质及程序。所述方法包括:确定目标图像中的至少一个人脸框;确定所述目标图像中的至少一个人体掩膜;基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。如此,本公开实施例可提高人脸和人体匹配的准确率。

Claims (19)

  1. 一种人脸和人体匹配的方法,所述方法由电子设备执行,所述方法包括:
    确定目标图像中的至少一个人脸框;
    确定所述目标图像中的至少一个人体掩膜;
    基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
  2. 根据权利要求1所述的方法,其中,所述确定所述目标图像中的至少一个人体掩膜,包括:
    确定所述目标图像中的至少一个人体框,以及所述至少一个人体框中的人体掩膜;
    在所述至少一个人体框中的单个人体框中包含不止一个人体掩膜的情况下,确定所述单个人体框中的目标人体掩膜;
    删除所述单个人体框中目标人体掩膜以外的其它人体掩膜。
  3. 根据权利要求2所述的方法,其中,所述确定所述单个人体框中的目标人体掩膜,包括:
    确定所述单个人体框中面积最大的两个第一人体掩膜;
    在所述两个第一人体掩膜的面积的差异值大于设定阈值的情况下,将所述两个第一人体掩膜中面积大的第一人体掩膜作为目标人体掩膜。
  4. 根据权利要求3所述的方法,其中,在所述确定所述单个人体框中面积最大的两个第一人体掩膜后,所述方法还包括:
    在所述两个第一人体掩膜面积的差异值不大于所述设定阈值的情况下,删除所述单个人体框及所述单个人体框中的人体掩膜。
  5. 根据权利要求2至4任一所述的方法,其中,所述确定所述目标图像中的至少一个人体框,包括:
    在所述目标图像中包含多个人体框的情况下,确定所述多个人体框中置信度最高的第一人体框;
    确定所述第一人体框与各第二人体框的重叠度,所述第二人体框为所述多个人体框中第一人体框以外的人体框;
    删除所述第二人体框中重叠度大于重叠度阈值的第二人体框;
    将删除后的第二人体框和所述第一人体框,确定为所述至少一个人体框。
  6. 根据权利要求1至5任一所述的方法,其中,所述基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系,包括:
    基于所述人脸框与所述人体掩膜顶端之间的距离,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
  7. 根据权利要求6所述的方法,其中,所述基于所述人脸框与所述人体掩膜顶端之间的距离,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系,包括:
    在所述目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集;其中,单个对应关系集中包含各人脸框和各人体掩膜之间的一组一一对应关系;
    根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值;其中,所述第一距离为具备对应关系的人脸框和人体掩膜顶端之间的距离,所述匹配分值与所述第一距离之和负相关;
    将匹配分值最大的对应关系集中的对应关系,作为所述目标图像中所述各人脸框和所述各人体掩膜之间的匹配关系。
  8. 根据权利要求1至7任一所述的方法,其中,在得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系后,所述方法还包括:
    将所述匹配关系存储到匹配关系库中,所述匹配关系库用于存储人脸和人体的匹配关系;
    响应于针对目标人体的身份信息查询请求,在所述匹配关系库中查找所述目标人体;
    在查找到所述目标人体的情况下,确定与所述目标人体具备匹配关系的人脸;
    根据所述人脸,确定所述目标人体的身份信息。
  9. 一种人脸和人体匹配的装置,包括:
    人脸框确定单元,配置为确定目标图像中的至少一个人脸框;
    人体掩膜确定单元,配置为确定所述目标图像中的至少一个人体掩膜;
    匹配关系确定单元,配置为基于所述人脸框的位置和所述人体掩膜的位置,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
  10. 根据权利要求9所述的装置,所述人体掩膜确定单元,包括:
    人体框确定单元,配置为确定所述目标图像中的至少一个人体框,以及所述至少一个人体框中的人体掩膜;
    目标人体掩膜确定单元,配置为在所述至少一个人体框中的单个人体框中包含不止一个人体掩膜的情况下,确定所述单个人体框中的目标人体掩膜;
    人体掩膜删除单元,配置为删除所述单个人体框中目标人体掩膜以外的其它人体掩膜。
  11. 根据权利要求10所述的装置,所述目标人体掩膜确定单元,包括:
    第一人体掩膜确定子单元,配置为确定单个人体框中面积最大的两个第一人体掩膜;
    目标人体掩膜确定子单元,配置为在所述两个第一人体掩膜的面积的差异值大于设定阈值的情况下,将所述两个第一人体掩膜中面积大的第一人体掩膜作为目标人体掩膜。
  12. 根据权利要求11所述的装置,所述装置还包括:
    人体框删除单元,配置为在所述两个第一人体掩膜面积的差异值不大于所述设定阈值的情况下,删除所述单个人体框及所述单个人体框中的人体掩膜。
  13. 根据权利要求10至12所述的装置,所述人体框确定单元,包括:
    第一人体框确定单元,配置为在所述目标图像中包含多个人体框的情况下,确定所述多个人体框中置信度最高的第一人体框;
    重叠度确定单元,配置为确定所述第一人体框与各第二人体框的重叠度,所述第二人体框为所述多个人体框中第一人体框以外的人体框;
    第二人体框删除单元,配置为删除所述第二人体框中重叠度大于重叠度阈值的第二人体框;
    至少一个人体框确定单元,配置为将删除后的第二人体框和所述第一人体框,确定为所述至少一个人体框。
  14. 根据权利要求9至13所述的装置,所述匹配关系确定单元,配置为基于所述人脸框与所述人体掩膜顶端之间的距离,得到所述人脸框中的人脸和所述人体掩膜中的人体之间的匹配关系。
  15. 根据权利要求14所述的装置,所述匹配关系确定单元,包括:
    对应关系集建立单元,配置为在所述目标图像中包含多个人脸框和多个人体掩膜的情况下,根据各人脸框和各人体掩膜之间不同的对应方式,建立多个对应关系集,其中,单个对应关系集中包含各人脸框和各人体掩膜之间的一组一一对应关系;
    匹配分值确定单元,配置为根据单个对应关系集中多个第一距离之和,确定单个对应关系集的匹配分值;其中,所述第一距离为具备对应关系的人脸框和人体掩膜顶端之间的距离,所述匹配分值与所述第一距离之和负相关;
    匹配关系确定子单元,配置为将匹配分值最大的对应关系集中的对应关系,作为目标图像中各人脸框和各人体掩膜之间的匹配关系。
  16. 根据权利要求9至15所述的装置,所述装置还包括:
    存储单元,配置为将所述匹配关系存储到匹配关系库中,所述匹配关系库用于存储人脸和人体的匹配关系;
    查找单元,配置为响应于针对目标人体的身份信息查询请求,在所述匹配关系库中查找所述目标人体;
    人脸确定单元,配置为在查找到所述目标人体的情况下,确定与所述目标人体具备匹配关系的人脸;
    身份信息确定单元,配置为根据所述人脸,确定所述目标人体的身份信息。
  17. 一种电子设备,包括:
    处理器;
    用于存储处理器可执行指令的存储器;
    其中,所述处理器被配置为调用所述存储器存储的指令,以执行权利要求1至8中任意一项所述的人脸和人体匹配的方法。
  18. 一种计算机可读存储介质,其上存储有计算机程序指令,所述计算机程序指令被处理器执行时实现权利要求1至8中任意一项所述的人脸和人体匹配的方法。
  19. 一种计算机程序,所述计算机程序包括计算机可读代码,在所述计算机可读代码在电子设备中运行的情况下,所述电子设备的处理器执行用于实现如权利要求1至8中任意一项所述的人脸和人体匹配的方法。
PCT/CN2021/102829 2021-03-25 2021-06-28 人脸和人体匹配的方法、装置、电子设备、存储介质及程序 WO2022198821A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110321139.6 2021-03-25
CN202110321139.6A CN112949568A (zh) 2021-03-25 2021-03-25 人脸和人体匹配的方法及装置、电子设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022198821A1 true WO2022198821A1 (zh) 2022-09-29

Family

ID=76228104

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102829 WO2022198821A1 (zh) 2021-03-25 2021-06-28 人脸和人体匹配的方法、装置、电子设备、存储介质及程序

Country Status (2)

Country Link
CN (1) CN112949568A (zh)
WO (1) WO2022198821A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949568A (zh) * 2021-03-25 2021-06-11 深圳市商汤科技有限公司 人脸和人体匹配的方法及装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107395965A (zh) * 2017-07-14 2017-11-24 维沃移动通信有限公司 一种图像处理方法及移动终端
CN110889315A (zh) * 2018-09-10 2020-03-17 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及系统
CN111476214A (zh) * 2020-05-21 2020-07-31 北京爱笔科技有限公司 一种图像区域匹配方法和相关装置
CN111709391A (zh) * 2020-06-28 2020-09-25 重庆紫光华山智安科技有限公司 一种人脸人体匹配方法、装置及设备
CN112949568A (zh) * 2021-03-25 2021-06-11 深圳市商汤科技有限公司 人脸和人体匹配的方法及装置、电子设备和存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109740516B (zh) * 2018-12-29 2021-05-14 深圳市商汤科技有限公司 一种用户识别方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107395965A (zh) * 2017-07-14 2017-11-24 维沃移动通信有限公司 一种图像处理方法及移动终端
CN110889315A (zh) * 2018-09-10 2020-03-17 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及系统
CN111476214A (zh) * 2020-05-21 2020-07-31 北京爱笔科技有限公司 一种图像区域匹配方法和相关装置
CN111709391A (zh) * 2020-06-28 2020-09-25 重庆紫光华山智安科技有限公司 一种人脸人体匹配方法、装置及设备
CN112949568A (zh) * 2021-03-25 2021-06-11 深圳市商汤科技有限公司 人脸和人体匹配的方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN112949568A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
TWI775091B (zh) 資料更新方法、電子設備和儲存介質
WO2020135127A1 (zh) 一种行人识别方法及装置
TWI766286B (zh) 圖像處理方法及圖像處理裝置、電子設備和電腦可讀儲存媒介
WO2021031609A1 (zh) 活体检测方法及装置、电子设备和存储介质
TWI702544B (zh) 圖像處理方法、電子設備和電腦可讀儲存介質
WO2021036382A9 (zh) 图像处理方法及装置、电子设备和存储介质
EP2998960B1 (en) Method and device for video browsing
WO2021093375A1 (zh) 检测同行人的方法及装置、系统、电子设备和存储介质
CN109934275B (zh) 图像处理方法及装置、电子设备和存储介质
WO2020062969A1 (zh) 动作识别方法及装置、驾驶员状态分析方法及装置
CN110889382A (zh) 虚拟形象渲染方法及装置、电子设备和存储介质
WO2022099989A1 (zh) 活体识别、门禁设备控制方法和装置、电子设备和存储介质、计算机程序
WO2022110614A1 (zh) 手势识别方法及装置、电子设备和存储介质
WO2019153925A1 (zh) 一种搜索方法及相关装置
CN109840917B (zh) 图像处理方法及装置、网络训练方法及装置
WO2022188305A1 (zh) 信息展示方法及装置、电子设备、存储介质及计算机程序
CN111553864A (zh) 图像修复方法及装置、电子设备和存储介质
WO2021047069A1 (zh) 人脸识别方法和电子终端设备
WO2022142298A1 (zh) 关键点检测方法及装置、电子设备和存储介质
CN109344703B (zh) 对象检测方法及装置、电子设备和存储介质
CN111523346A (zh) 图像识别方法及装置、电子设备和存储介质
WO2023040202A1 (zh) 人脸识别方法及装置、电子设备和存储介质
WO2022198821A1 (zh) 人脸和人体匹配的方法、装置、电子设备、存储介质及程序
CN112613447A (zh) 关键点检测方法及装置、电子设备和存储介质
WO2023173659A1 (zh) 人脸匹配方法及装置、电子设备、存储介质、计算机程序产品及计算机程序

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932437

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 22.01.2024)