WO2019105337A1 - 基于视频的人脸识别方法、装置、设备、介质及程序 - Google Patents

基于视频的人脸识别方法、装置、设备、介质及程序 Download PDF

Info

Publication number
WO2019105337A1
WO2019105337A1 PCT/CN2018/117662 CN2018117662W WO2019105337A1 WO 2019105337 A1 WO2019105337 A1 WO 2019105337A1 CN 2018117662 W CN2018117662 W CN 2018117662W WO 2019105337 A1 WO2019105337 A1 WO 2019105337A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
sequence
feature
video frame
person
Prior art date
Application number
PCT/CN2018/117662
Other languages
English (en)
French (fr)
Inventor
刘文韬
钱晨
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2019105337A1 publication Critical patent/WO2019105337A1/zh
Priority to US16/455,173 priority Critical patent/US11068697B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to computer vision technology, and more particularly to a video-based face recognition method, a video-based face recognition device, an electronic device, a computer readable storage medium, and a computer program.
  • Identifying people in the video can provide information support for a variety of applications; for example, by identifying people in the video, the subject of the video can be obtained; for example, by identifying the people in the video, the video can be classified and managed. .
  • the embodiment of the present application provides a video-based face recognition technology solution.
  • a video-based face recognition method mainly comprising: being present in a plurality of video frames in a video, and in the plurality of video frames a face image whose position meets a predetermined displacement requirement forms a set of face sequences, wherein the face sequence is a face image set belonging to the same person among the plurality of video frames; and for a group of face sequences, at least according to The face feature in the face sequence uses the face database set in advance to perform face recognition.
  • a face image in the video that appears in consecutive multiple video frames, and a position in the plurality of video frames that meets a predetermined displacement requirement includes: a face image of the same person Appears in the adjacent video frames, and the position of the same person's face image in the previous video frame and its position in the next video frame meet the predetermined displacement requirement.
  • a face image of the video that appears in consecutive multiple video frames, and a position in the plurality of video frames that meets a predetermined displacement requirement includes: acquiring the video a face image belonging to the same person among consecutive N video frames, wherein N is an integer greater than two; determining a position in the previous video frame in the face image belonging to the same person and in the next video frame The displacement of the position corresponds to the face image pair of the predetermined displacement requirement; if the face image pair meeting the predetermined displacement requirement, the intersection ratio of the face image belonging to the same person satisfies the preset ratio, then the belonging A face image of the same person forms a set of said face sequences.
  • the face image of the same person includes: a face image whose similarity of the face feature meets a predetermined similarity requirement.
  • the group of faces in the video that appear in consecutive multiple video frames and whose positions in the plurality of video frames meet the predetermined displacement requirements form a group of faces
  • the sequence includes: creating a set of face sequences for one or more face images in at least one of the face images of the first appearance of the video in the video; appearing for not appearing in the previous video frame Creating one set of face sequences in one or more face images in at least one face image in the latter video frame; the same person who appears in the spatial position of the previous video frame and the subsequent video frame
  • the face image is assigned to the face sequence of the same person.
  • the face image of the same person whose spatial position is consecutive in the previous video frame and the subsequent video frame is classified into the face sequence of the same person: separately acquired a face feature of at least one of the preceding video frames, a position of at least one of the previous video frames in the previous video frame, and at least one face in the subsequent video frame a face feature of the image, a position of at least one face image in the latter video frame in a subsequent video frame; a position in the previous video frame and a subsequent video according to at least one face image in the previous video frame a position of the at least one face image in the frame in the subsequent video frame determines a face image pair whose displacement meets a predetermined displacement requirement; and for a face image pair whose displacement meets a predetermined displacement requirement, the face of the face image pair is determined When the similarity of the feature pair satisfies the predetermined similarity requirement, determining that the face image in the next video frame in the face image pair belongs to the face sequence to which the face image in the previous video
  • the one or more face images in the at least one face image that does not appear in the previous video frame but appear in the next video frame respectively create a group of faces.
  • the sequence includes: for a face image pair whose displacement meets a predetermined displacement requirement, in the case where it is determined that the similarity of the face feature pair of the face image pair does not satisfy the predetermined similarity requirement, the latter one of the face image pairs A face image in a video frame creates a sequence of faces.
  • the method further includes: acquiring a facial feature, and the acquiring the facial feature includes: performing face detection on the video frame based on the face detector, and obtaining at least one face of the video frame. Circumference frame information of the image; providing the video frame and the circumscribing frame information of the at least one face image of the video frame to a neural network for extracting facial features, and obtaining at least one facial image of the video frame via the neural network Face features.
  • the method uses a preset face database after forming a set of face sequences, and for a group of face sequences, according to at least the face features in the face sequence.
  • the method further includes: clustering at least part of the face sequence according to the face features of at least part of the face sequence to merge different face sequences corresponding to the same person; wherein, in the clustering After processing, different face sequences correspond to different people.
  • the face feature of the face sequence includes a weighted average of face features of at least part of the face image in the face sequence.
  • the weight of the face feature of at least part of the face image in the face sequence is determined according to the face image quality of at least part of the face image.
  • the face image quality includes at least one of a light intensity of the face image, a sharpness of the face image, and a face orientation.
  • the face database includes: a face feature of a plurality of people, and for any one, the face feature of the person includes: a comprehensive face feature of the person, and the person is in the Face features in different pictures.
  • the integrated facial feature includes a weighted average of facial features of a person in different pictures.
  • the performing face recognition by using a preset face database for at least one face face according to the face feature in the face sequence includes: targeting a group of faces At least one facial feature in the sequence, calculating a similarity between the facial feature and a comprehensive facial feature of at least one person in the face database, and determining a person in the face database corresponding to the highest similarity; a person in the face database corresponding to the highest similarity determined by at least part of the face features in the face sequence, voting, and the person who votes the most is the person to whom the face sequence belongs; for the face sequence, Determining, according to at least one facial feature in the face sequence and the similarity of the facial features in different pictures in the face database of the person to which the face sequence belongs, determining the confidence that the face sequence belongs to the person.
  • the determining is based on at least one facial feature of the face sequence and a similarity of facial features in different pictures of a person to which the face sequence belongs in the face database.
  • the confidence that the face sequence belongs to the person includes: calculating, for any face feature in the face sequence, the face feature of the face feature and the face feature set that is most similar to the face pose of the face feature.
  • the similarity degree is determined according to the calculated similarity between the face feature and the face feature of the face feature set that is most similar to the face pose of the face feature, wherein the face sequence belongs to the confidence of the person;
  • the face feature set includes: a face feature of the person in a different picture in the face database.
  • the determining is based on at least one facial feature of the face sequence and a similarity of facial features in different pictures of a person to which the face sequence belongs in the face database.
  • the confidence that the face sequence belongs to the person includes: determining, according to the face key point in the face sequence and the face key point in the face feature set, that the face feature set in the face feature set is most similar to the face pose in the face sequence Face features.
  • the method further includes: obtaining a face key point of the at least one face image of the video frame via the neural network.
  • the determining is based on at least one facial feature of the face sequence and a similarity of facial features in different pictures of a person to which the face sequence belongs in the face database.
  • the confidence that the face sequence belongs to the person further includes: using the similarity between the face feature of the face sequence and the comprehensive face feature of the person to which the face sequence belongs, correcting the confidence that the face sequence belongs to the person .
  • a video-based face recognition apparatus comprising: forming a face sequence module for appearing in a plurality of consecutive video frames for a video, and Forming, in the plurality of video frames, a face image that meets a predetermined displacement requirement, forming a set of face sequences, wherein the face sequence is a face image set belonging to the same person among the plurality of video frames;
  • the face recognition module is configured to perform face recognition on a set of face sequences, at least according to the face features in the face sequence, by using a preset face database.
  • the device further includes: a method for acquiring a face feature, configured to perform face detection on a video frame based on a face detector, and obtain external frame information of at least one face image of the video frame;
  • the video frame and the bounding box information of the at least one face image of the video frame are provided to a neural network for extracting facial features via which the facial features of at least one facial image of the video frame are obtained.
  • the device further includes: a face sequence clustering module, configured to perform clustering processing on at least part of the face sequence according to the facial features of at least part of the face sequence, to merge and correspond to the same A different face sequence of the individual; wherein, after the clustering process, different face sequences correspond to different people, and the clustered face sequence is provided to the face recognition module.
  • a face sequence clustering module configured to perform clustering processing on at least part of the face sequence according to the facial features of at least part of the face sequence, to merge and correspond to the same A different face sequence of the individual; wherein, after the clustering process, different face sequences correspond to different people, and the clustered face sequence is provided to the face recognition module.
  • an electronic device comprising: the device of any of the above embodiments.
  • an electronic device comprising: a memory for storing a computer program; a processor for executing a computer program stored in the memory, and when the computer program is executed The method described in the operation of any of the above embodiments is implemented.
  • a computer readable storage medium having stored thereon a computer program that, when executed by a processor, implements the method of any of the above-described embodiments.
  • a computer program comprising computer instructions that, when executed in a processor of a device, implement the method of any of the above embodiments.
  • the present application has temporal continuity and spatial position continuity in a video by utilizing a face of the same person.
  • the feature forms a face sequence, which can quickly and accurately set the face of the same person appearing continuously in the video in the same set of face sequences; thereby performing face recognition on the face sequence obtained by the above method by using the face database , can quickly and accurately identify whether the person in the video is a person in the face database.
  • FIG. 3 is a schematic diagram of an embodiment of a face sequence clustering process of the applicant.
  • FIG. 4 is a flow chart of one embodiment of forming a face sequence in the present application.
  • Figure 5 is a schematic structural view of an embodiment of the device of the present application.
  • FIG. 6 is a block diagram of an exemplary apparatus for implementing an embodiment of the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, and servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above.
  • Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • FIG. 1 is a flow chart of one embodiment of a method of the present application. As shown in FIG. 1, the method of this embodiment includes: operating S100 and operating S110.
  • a face sequence is a collection of face images belonging to the same person among a plurality of video frames.
  • the video of the present application may be a video based on RGB form, or may be based on other forms of video.
  • the video may be a video based on a real person, and the face in the corresponding face image may be a real face, and the video may also be a video based on the drawn person, a face in the corresponding face image. It can also be a drawn face, for example, the video can be an animated cartoon. This application does not limit the presentation of the video.
  • the present application can obtain a continuous N (N is an integer greater than 2) video frames of a video belonging to the same person and determine the face image belonging to the same person in the previous video.
  • N is an integer greater than 2
  • a face image pair whose position in the frame is displaced by the position in the next video frame according to a predetermined displacement requirement; if the face image pair meeting the predetermined displacement requirement, the intersection ratio of the face images belonging to the same person satisfies
  • the face images belonging to the same person form a set of face sequences.
  • the relationship between the predetermined displacement requirement and the preset ratio in the present application may be related.
  • the predetermined displacement requirement is relatively strict (for example, the displacement interval is required to be relatively small)
  • the range of the preset ratio is It can be relatively small
  • the predetermined displacement requirement is relatively loose (for example, the displacement spacing is required to be relatively large)
  • the range of the preset ratio can be relatively large.
  • the temporal continuity of the face image in the present application generally includes: the face image of the same person appears in In at least two video frames that are played continuously before and after.
  • the continuity of the face image in the present application in spatial position generally includes that the face image of the same person appears at substantially the same position in two video frames that are continuously played back and forth. That is to say, the face image of the same person appears in two video frames adjacent to each other, and the position of the face image of the same person in the previous video frame is in accordance with the displacement of the position in the latter video frame. In the case of displacement requirements, it can be considered that the face image has continuity in time series and spatial position.
  • the face image of the same person described above may refer to a face image whose similarity of the face features meets the predetermined similarity requirement.
  • the predetermined displacement requirement may be set according to actual requirements, for example, the position range in the previous video frame is expanded by a predetermined multiple (eg, 1.1-1.5 times, etc.), if the position in the latter video frame is in a position range after the predetermined multiple is expanded, It is considered to meet the predetermined displacement requirements.
  • the present application generally forms one or more sets of face sequences, and all face images in each set of face sequences, according to the continuity of the face images in the video in time series and spatial position. Belong to the same person. Each set of face sequences typically includes one or more face images.
  • any set of face sequences formed by the present application generally includes a face feature of at least one face image, and may also include at least a face feature including at least one face image.
  • a face key point of a face image the information contained in any set of face sequences formed by the present application may also be other forms of information capable of uniquely describing the characteristics of a person's face image.
  • the present application may adopt an existing face feature extraction technology and an existing face key point detection technology to obtain a face feature of at least one face image in the video frame and at least one face image.
  • a face key point for example, the present application can obtain a face feature and a face key point of at least one face image in a video frame through a face detector and a neural network for extracting facial features;
  • the application can provide at least one video frame in the video to the face detector, and the face detector performs face detection on the input video frame, and if the face detector detects the face image in the video frame, the person The face detector outputs the bounding box information of the at least one face image detected by the face detector (for example, the length and width of the outer frame and the center position information, etc.), and the present application can cut the corresponding video frame according to the external frame information, thereby obtaining At least one face image block in the corresponding video frame, at least one face image block can be input to the nerve for extracting facial features after resizing a network, so
  • the present application can utilize the existing neural network to obtain facial features and face key points.
  • the network structure of the neural network can be flexibly designed according to actual needs, and the present application does not limit the network structure of the neural network.
  • the neural network may include, but is not limited to, a convolutional layer, a non-linear Relu layer, a pooled layer, a fully connected layer, etc., the more the number of network layers, the deeper the network; for example, the network structure of the neural network may adopt It is not limited to the structure of a network such as ALexNet, Deep Residual Network (ResNet) or VGGnet (Visual Geometry Group Network).
  • ALexNet Deep Residual Network
  • VGGnet Visual Geometry Group Network
  • the present application does not limit the implementation of facial features and face key points for obtaining at least one face image in a video frame.
  • the process of forming a face sequence by the present application generally includes: creating a face sequence and determining a face sequence to which the face image in the video frame belongs.
  • An optional example of creating a face sequence is: when detecting a video frame of a face first appearing in the video, creating one for each of the one or more face images in the at least one face image in the video frame. Group face sequence. For example, no human face image appears in the first to fourth video frames in the detected video, and the third human face image, that is, the first human face image, the second human face image, and the first For the three-person face image, the present application separately creates a set of face sequences for the three face images, namely, a first face sequence, a second face sequence, and a third face sequence.
  • the first face sequence may include a face feature and a face key of the first face image
  • the second face sequence may include a face feature and a face key of the second face image
  • the third face sequence may A face feature and a face key point containing a third face image
  • Another alternative example of creating a face sequence is: detecting one or more of such face images that are not present in the previous video frame but appear in the next video frame.
  • the personal face image creates a set of face sequences, respectively.
  • the present application separately creates a set of face sequences, that is, a fourth face sequence and a fifth face sequence, for the fourth face image and the fifth face image, and the fourth face sequence may include the face of the fourth face image.
  • the fifth face sequence may include face features and face key points of the fifth face image.
  • the present application creates a group for the fourth face image.
  • the face sequence that is, the fourth face sequence, the fourth face sequence may include a face feature and a face key point of the fourth face image.
  • An optional example of determining a face sequence to which the face image in the video frame belongs is: assigning the face image of the same person whose spatial position in the previous video frame and the subsequent video frame are consecutive In the personal face sequence.
  • the present application can classify the first face image that appears in the sixth video frame and is consecutive to the appearance position in the fifth video frame in the first face sequence, and will appear in the sixth video frame and the fifth.
  • the second face image in which the appearance position is continuous in the video frame is classified into the second face sequence, and the third face image that appears in the sixth video frame and is consecutive to the appearance position in the fifth video frame is placed.
  • the present application may add the face feature and the face key point of the first face image that appears in the sixth video frame and are consecutive to the appearance position in the fifth video frame.
  • a face feature and a face key point of a second face image that appears in the sixth video frame and are consecutive to the appearance position in the fifth video frame are added to the second face sequence, and a third face that appears in the sixth video frame that is continuous with the appearance position in the fifth video frame Facial features and face critical point like to add a third person in the face sequence.
  • the present application may appear in the sixth video frame.
  • the first face image that is continuous with the appearance position in the fifth video frame is placed in the first face sequence
  • the second face image in the sixth video frame that is continuous with the appearance position in the fifth video frame In the second face sequence, for example, the present application can add the face feature and the face key of the first face image in the sixth video frame to the first face sequence, and the sixth video frame.
  • the face feature and the face key of the second face image in the middle face are added in the second face sequence.
  • the person of the same person who has consecutive spatial positions in the previous video frame and the next video frame is required.
  • the face image is assigned to the face sequence of the same person.
  • An optional example of the implementation of the face sequence assigned to the same person is: for any two adjacent video frames in the video, First, acquiring a face feature of at least one face image in a previous video frame, a position of at least one face image in a previous video frame in a previous video frame, and at least one face in a subsequent video frame.
  • a face feature of the image a position of at least one face image in the subsequent video frame in a subsequent video frame; and then, based on a position and a position of the at least one face image in the previous video frame in the previous video frame a position of at least one face image in a video frame in a subsequent video frame determines a face image pair whose displacement meets a predetermined displacement requirement; if there is no face image pair whose displacement meets a predetermined displacement requirement, Determining that a new face sequence needs to be created for the face image in the next video frame in the face image pair; if there is a face image pair whose displacement meets the predetermined displacement requirement, then for each face image pair, it is determined Whether the similarity of the face feature pairs of the face image pair satisfies a predetermined similarity requirement, and if the predetermined similarity requirement is met, it may be determined that the face image in the next video frame of the face image pair belongs to the previous video.
  • the method for forming a face sequence in the present application can quickly and accurately divide at least one face image in at least one video frame of the video into the corresponding face sequence, which is beneficial to improving the efficiency and accuracy of the face recognition.
  • This application forms an optional implementation of a face sequence, as described below with respect to FIG.
  • the operation S100 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a face formation module 500 executed by the processor.
  • S110 Perform face recognition on a set of face sequences based on at least a face feature in the face sequence, using a preset face database.
  • the present application may perform face recognition by using a preset face database according to a face feature of a face sequence and a face feature of at least part of the face image in the face sequence. That is to say, the face sequence in the present application has a face feature, and the face feature of the face sequence is generally a face feature obtained by comprehensively considering the face feature of at least part of the face image in the face sequence.
  • the face features of at least part of the face images in the face sequence respectively correspond to the right value (ie, at least part of the face images respectively correspond to the weight value), by using at least part of the weights to at least part of the face sequence
  • the face feature of the face image is weighted, so that a weighted average of the face features based on at least part of the face image can be obtained.
  • the weighted average can be used as the face feature of the face sequence.
  • the face feature of the face sequence in the present application may also be referred to as a comprehensive face feature of the face sequence.
  • the application may determine, according to the face image quality of at least part of the face image in the face sequence, a weight corresponding to each of the face features of the face image in the face sequence.
  • the face image quality in the present application may include: one or more of a light intensity of a face image, a sharpness of a face image, and a face orientation; wherein the face orientation may be obtained based on a face key point, for example,
  • the application can use the existing calculation method to calculate the key points of the face to obtain the face orientation.
  • the weight ratio of the three can be determined according to the actual situation.
  • This application does not limit the relationship between face image quality and weight.
  • the present application can utilize the prior art to evaluate the face image quality of at least part of the face image in the face sequence.
  • This application does not limit the implementation of evaluating face image quality.
  • the present application can also obtain facial features of a face sequence by other means than weighted averaging, for example, by performing an average calculation on facial features of at least part of the face images in the face sequence,
  • the face average feature of a part of a face, the face average feature of the present application can be used as a face feature of a face sequence.
  • the present application does not limit the implementation of determining facial features of a face sequence based on at least some of the facial features in the face sequence.
  • the application can avoid the adverse effects of the poor face feature on the face recognition on the face recognition. For example, completely ignoring poor quality facial features may affect the accuracy of the subject's determination when determining the subject of the video; for example, treating poorly-quality facial features with better-quality facial features. , may affect the accuracy of the description of the face features of the face sequence.
  • the present application may be based on the face for any set of face sequences in all face sequences. a face feature of the sequence and a face feature of at least part of the face image in the face sequence, using a preset face database to perform face recognition, for example, identifying a person to whom the face belongs and a belief belonging to the person Degree and the like; an optional example, for any set of face sequences in all face sequences, the application may be based on the face feature of the face sequence and the face of at least part of the face image in the face sequence The feature determines that the person corresponding to the face sequence is a confidence level of a person in the face database set in advance, so that according to the confidence level, whether the person corresponding to the face sequence is a person in the face database is determined.
  • the present application can determine, for each group of face sequences, whether the person corresponding to the face sequence is a person in the face database.
  • the present application can also determine whether the person corresponding to the face sequence is a face database only for a part of the face sequence.
  • the person in the present application can also determine whether the person corresponding to the face sequence is a person in the face database only for the face sequence including the largest number of face images.
  • the present application pre-sets a face database, which includes: a face feature of a plurality of people.
  • the face feature of the person in the face database usually includes two Part of the content, some of which are the person's comprehensive face features, and the other part is the person's face features in different pictures (such as photos or video frames, etc.).
  • the integrated face feature is usually a face feature obtained by comprehensively considering the face features of the person in different pictures.
  • the face features of each face of the person in different pictures respectively correspond to the right value
  • the The weighted average of the face features of each face of the person in different pictures is weighted by using each weight, so that a weighted average value of the face features of each face in the different pictures of the person can be obtained, and the present application can The weighted average is used as the comprehensive face feature of the person in the face database.
  • the present application may determine, according to the face image quality of the face in different pictures, the weights corresponding to the face features of the face in different pictures.
  • the above-described face image quality may include one or more of a light intensity of a face image, a sharpness of a face image, and a face orientation.
  • the higher the face image quality the larger the weight corresponding to the face image; and the lower the face image quality, the smaller the weight corresponding to the face image.
  • This application does not limit the relationship between face image quality and weight.
  • the present application can utilize the prior art to evaluate the face image quality of faces in a face database in different pictures. This application does not limit the implementation of evaluating face image quality.
  • the present application can also obtain the integrated facial features of each person in the face database by other means than the weighted average, for example, by using the face features of different faces in a face database for a person in a face database.
  • the average calculation can be performed to obtain a facial average feature based on the person, and the present application can use the facial average feature as the comprehensive facial feature of the person in the face database.
  • the present application does not limit the implementation of determining the integrated facial features of anyone in the face database.
  • the face database of the present application further includes: a face key point of the face in different pictures.
  • the method of determining the integrated face feature may be the same as the method of determining the face feature of the face sequence.
  • At least one face feature in the face sequence is calculated (eg, The similarity between all facial features and the integrated facial features of at least one person in the face database, and the highest similarity is selected for at least one facial feature (for example, each facial feature) in the face sequence, thereby The person in the face database corresponding to the highest similarity is used to vote, and the present application can determine the person to whom the face sequence belongs according to the voting result.
  • any set of face sequences in all face sequences calculating a similarity between the first face feature in the face sequence and the integrated face feature of the first person in the face database Degree, the similarity of the second person's comprehensive facial features... the similarity of the composite facial features of the last person (for example, the Nth person), thereby obtaining N similarities, and selecting the highest similarity among the N similarities Degree; calculating the similarity between the second face feature in the face sequence and the integrated face feature of the first person in the face database, and the similarity of the second person's comprehensive face feature...
  • the last person's comprehensive person The similarity of the face features, so that N similarities are obtained again, and the highest similarity is selected from the N similarities; and so on, the Mth (for example, the last) facial features in the face sequence are calculated respectively.
  • the present application can obtain M highest similarities; the present application uses the people in the face database corresponding to the M highest similarities to vote, for example, M-1 (M>2) highest similarities correspond to the face database
  • M-1 M>2
  • the application can determine that the face sequence belongs to the first person in the face database according to the result of the voting.
  • the present application does not limit the implementation of calculating the similarity between the face features and the integrated face features in the face sequence.
  • the application may determine that the face sequence belongs to the face database according to at least one facial feature in the face sequence and the face feature of the face of the person in the face database in the face database.
  • the confidence of the person in the middle an optional example, for convenience of description, the present application can treat the face features of the face of the person whose face sequence belongs to the face database in different pictures as a set of face features, For the face feature of each face image in the face sequence, the present application separately searches for a face feature that is most similar to the face pose of the face feature from the face feature set, in the face sequence.
  • a face feature that is most similar to the face pose of the face feature in a face feature and a face feature set forms a face feature pair, and calculates a similarity between two face features in the face feature pair (eg, , confidence level, so that the present application can obtain a similarity for one or more facial features in the face sequence, so that the present application can determine the similarity based on all the similarities calculated for the face sequence.
  • Face sequences belong to a confidence face database of the person.
  • the present application can correct the confidence by using the facial features of the face sequence; in an optional example, the present application can calculate the facial features of the face sequence and the face sequence belongs to the face database.
  • the similarity degree of the comprehensive face feature of the person may determine the similarity as the confidence of the person whose face sequence belongs to the face database when determining that the similarity is less than the above confidence level.
  • the degree of confidence of the person whose face sequence belongs to the face database is not updated using the similarity.
  • the present application can determine the confidence by using the face features of at least one face feature and the face feature set that are most similar to the face pose in the face sequence, thereby avoiding the influence of the pose difference of the face on the accuracy of the confidence calculation.
  • the present application can avoid the human face by calculating the similarity between the face feature of the face sequence and the face feature of the person whose face sequence belongs to the face database, and correcting the determined confidence by using the similarity.
  • Each face in the face sequence is erroneous due to the fact that the face orientation is too single (such as the face facing the left side), which causes an error when determining that the face sequence belongs to the face database; The accuracy of face recognition.
  • the present application can determine the pose of the face by using the face key point, so that the face feature of the face feature set that is most similar to the face pose in the face sequence can be determined, for example, a person.
  • the face sequence includes face features and face key points (hereinafter referred to as first face key points) of each face, and the face database includes face features and face key points of the person in different pictures ( The following is called a second face key point.
  • the present application can map the first face key of one face in the face sequence to a standard blank image, and the person in the face database is in a different picture.
  • the second face key points are also mapped into the standard blank image, respectively, by comparing the position between at least one of the first face key points and at least one of the at least one second face key points According to the comparison result, the face features with the most similar gestures can be selected from the set of facial features according to the comparison result.
  • the face gesture in the present application can generally express the face orientation and facial expressions, etc.
  • the face gesture can usually be determined by the key points of the face. This application can regard the face gesture as the physical meaning of the key point of the face.
  • the present application can determine whether any face in the video belongs to a corresponding person in the face database according to the finally determined confidence level.
  • the judgment result of the present application can be applied to various applications, for example, determining a video subject person, determining all characters in the video, or determining a video related to a certain person, etc., so that the present application can realize automatic management of the video.
  • the operation S110 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a face recognition module 510 that is executed by the processor.
  • the method of this embodiment mainly includes: operation S200, operation S210, operation S220, operation S230, operation S240, operation S250, and operation S260.
  • S200 Form a set of face sequences for a face image in a video that appears in consecutive multiple video frames and that meets a predetermined displacement requirement in a plurality of video frames. That is to say, the present application can form at least one set of face sequences according to the continuity of the face in the video in time series and spatial position.
  • each set of face sequences has a face feature, which is a face feature of the face sequence; at the same time, each set of face sequences usually includes one or more face images. Face features. All facial features in each set of face sequences belong to the same person.
  • the operation S200 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a face formation module 500 executed by the processor.
  • S210 Perform clustering processing on at least part of the face sequence to combine different face sequences corresponding to the same person in at least part of the face sequence.
  • each circle represents a group of face sequences
  • the faces of the first main character and the second main character in the video may form 11 sets of face sequences due to their discontinuity in time series or spatial position.
  • the 11 sets of face sequences are divided into two categories during the clustering process, namely, the group formed by the six groups of face sequences on the upper right side of Figure 3 and the five groups of face sequences on the lower right side of Figure 3.
  • the present application can combine the six sets of face sequences on the upper right side of FIG. 3 into one set of face sequences, and merge the five sets of face sequences on the lower right side of FIG. 3 into another set of face sequences.
  • the present application may perform clustering processing according to facial features of at least part of the face sequence.
  • This application does not limit the implementation of clustering processing.
  • the merged face sequence still has a face feature, and the face feature may be obtained by calculating all the face features in at least part of the face sequences merged with each other based on weights, or may be The face features of at least part of the face sequences merged with each other are calculated.
  • the merged face sequence includes all of the face features in at least a portion of the face sequences that are merged with each other.
  • the operation S210 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a face sequence clustering module 530 executed by the processor.
  • S220 Calculate, according to a set of face sequences in the at least one set of face sequences after the clustering process, a similarity between the at least one face feature in the face sequence and the at least one face feature in the preset face database. degree.
  • the present application may select a face sequence containing the largest number of faces from all the face sequences after the clustering process, and calculate at least part of the face features in the selected face sequence respectively.
  • the similarity of the face features of at least some of the persons in the preset face database may be selected.
  • At least one facial feature in a set of facial sequences select the highest similarity from the calculated at least one similarity of the facial features, and target the person in the face database corresponding to the at least part of the highest similarity.
  • a vote is made to determine the person in the face database to which the face sequence belongs based on the result of the vote. All the pictures in the face database of the face database to which the face sequence belongs form a set of face features.
  • S240 Obtain a face key point of the face to be processed from the face sequence, and determine a face according to the face key point of the face key of the to-be-processed face and the face key point of the at least one picture in the face feature set.
  • the face feature of the feature set that is most similar to the face gesture to be processed, and the similarity between the two is calculated according to the face feature of the face to be processed and the face feature with the most similar posture.
  • S260 Determine, according to the similarity calculated above, the confidence that the face sequence belongs to a person in the face database, for example, calculate an average of all similarities, and use the average value as the confidence.
  • the present application may also calculate a similarity between the face feature of the face sequence and the comprehensive face feature of the face sequence belonging to the person in the face database, and determine the similarity between the similarity and the confidence level. The size, if the similarity is less than the above confidence, the similarity may be used as the confidence that the face sequence belongs to a person in the face database; otherwise, the confidence determined above is not corrected.
  • the operations S220, S230, S240, S250, and S260 may be performed by a processor invoking a corresponding instruction stored in the memory, or by a face recognition module 520 executed by the processor.
  • FIG. 4 is a flow chart of one embodiment of forming a face sequence for the present application. As shown in FIG. 4, the method of this embodiment includes the following operations:
  • the flow of forming a face sequence in the application begins, and an initialization operation is performed to initialize the face position and the face feature of the previous frame. For example, the face position and the face feature of the previous frame are respectively initialized to be empty.
  • S420 sequentially read a video frame from the video according to a play time sequence of the video, and perform face detection on the video frame by using a face detector.
  • S450 Calculate the similarity of the facial features according to the at least one facial feature whose displacement meets the predetermined requirement and the corresponding facial features in the previous frame, and perform the corresponding processing of operation S460 for the facial image whose similarity satisfies the predetermined similarity requirement. For the face image whose similarity does not satisfy the predetermined similarity requirement, the corresponding processing of operation S451 is performed.
  • any of the video-based face recognition methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any video-based face recognition method provided by the embodiments of the present application may be executed by a processor, such as a processor, by executing a corresponding instruction stored in a memory, to perform any of the video-based persons mentioned in the embodiments of the present application. Face recognition method. This will not be repeated below.
  • the foregoing programs may be stored in a computer readable storage medium, and the program is executed when executed.
  • the operation of the foregoing method embodiment is included; and the foregoing storage medium includes at least one medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 5 is a schematic structural view of an embodiment of the apparatus of the present application.
  • the apparatus of this embodiment mainly includes: forming a face sequence module 500 and a face recognition module 510.
  • the apparatus may further include: acquiring at least one of a face feature module 520 and a face sequence clustering module 530.
  • Forming a face sequence module 500 is mainly used to form a set of face sequences for a face image in a video that appears in consecutive multiple video frames and whose position in the plurality of video frames meets a predetermined displacement requirement;
  • a face sequence is a collection of face images belonging to the same person among a plurality of video frames.
  • the face sequence module 500 is configured to obtain a face image of the same person in the consecutive N (N is an integer greater than 2) video frames of the video, and form the face sequence module 500 to determine the face image belonging to the same person. a face image pair in which the position in the previous video frame and the position in the subsequent video frame are in accordance with the predetermined displacement requirement; if the face image pair meets the predetermined displacement requirement, the face image belongs to the same person The intersection ratio meets the preset ratio, and the face sequence module 500 is formed to form a group of face images belonging to the same person.
  • the face sequence module 500 can be configured to set such face images in the same set of face sequences.
  • the face image of the same person described above includes a face image whose similarity of the face feature meets the predetermined similarity requirement.
  • the formed face sequence module 500 can create a set of face sequences for one or more of the at least one of the face images in the video that first appear in the video.
  • the form face sequence module 500 can create a set of face sequences for one or more face images that are not present in the previous video frame but appear in at least one of the face images in the next video frame.
  • the face formation module 500 can also classify the face images of the same person whose spatial positions appearing in the previous video frame and the subsequent video frame are in the same person's face sequence.
  • the formed face sequence module 500 respectively acquires a face feature of at least one of the front and rear adjacent video frames, and at least one of the previous video frames in the previous video. a position in the frame, a face feature of at least one face image in the subsequent video frame, a position of at least one face image in the latter video frame in a subsequent video frame; forming a face sequence module 500 according to the previous one a position of at least one face image in the video frame in a previous video frame and a position in at least one of the subsequent video frames in the subsequent video frame to determine a face image pair whose displacement meets a predetermined displacement requirement; The face sequence module 500 determines, for the face image pair whose displacement meets the predetermined displacement requirement, the next one of the face image pairs in the case where the similarity of the face feature pair of the face image pair is determined to satisfy the predetermined similarity requirement.
  • the face image in the video frame belongs to the face sequence to which the face image in the previous video frame belongs.
  • the face sequence module 500 is formed, and for the face image pair whose displacement meets the predetermined displacement requirement, in a case where the similarity of the face feature pair of the face image pair is determined not to satisfy the predetermined similarity requirement, A face sequence is created for the face image in the latter video frame of the face image pair.
  • the facial features used to form the face sequence module 500 are provided by the acquired face feature module 520.
  • the acquiring face feature module 520 is mainly configured to perform face detection on a video frame based on a face detector, obtain outer frame information of at least one face image of the video frame, and obtain at least one face image of the video frame and the video frame.
  • the bounding box information is provided to a neural network for extracting facial features, and facial features of at least one facial image of the video frame are obtained via the neural network.
  • the acquired face feature module 520 can also obtain a face key of at least one face image of the video frame via the neural network.
  • the face recognition module 510 is mainly configured to perform face recognition using a preset face database according to a face feature in the face sequence for at least one set of face sequences.
  • the face database in the present application includes: a face feature of a plurality of people, and for any person, the face feature of the person includes: a comprehensive face feature of the person, and the person is in the Face features in different pictures.
  • the above comprehensive facial features include a weighted average of facial features of a person in different pictures.
  • the face recognition module 510 may calculate the similarity between the face feature and the integrated face feature of at least one person in the face database for at least one of the set of face sequences. And determining a person in the face database corresponding to the highest similarity; secondly, the face recognition module 510 can determine the person in the face database corresponding to the highest similarity determined for at least part of the face features in the face sequence And voting, and the person who votes the most is the person to whom the face sequence belongs; then, the face recognition module 510 can target at least one face feature and the face in the face sequence for the face sequence Determining the similarity of the face features in the different pictures in the face database by the person to whom the sequence belongs, determining that the face sequence belongs to the person's confidence; for example, the face recognition module 510 may target at least one of the face sequences
  • the face feature calculates the similarity of the face feature that is most similar to the face pose of the face feature in the face feature and the face feature set, and calculates the face feature and the face feature according to
  • the face recognition module 510 can determine, according to the face key points in the face sequence and the face key points in the face feature set, face features that are most similar to the face poses in the face sequence in the face feature set. . In addition, the face recognition module 510 can correct the confidence that the face sequence belongs to the person by using the similarity between the face feature of the face sequence and the integrated face feature of the person to which the face sequence belongs.
  • the face sequence clustering module 530 is mainly configured to perform clustering processing on at least part of the face sequence according to the face features of at least part of the face sequence to merge different face sequences corresponding to the same person; wherein, after the clustering process The different face sequences correspond to different people, and the clustered processed face sequences are provided to the face recognition module 510.
  • the face feature of the face sequence in the present application may be a weighted average of face features of at least part of the face image in the face sequence.
  • the weight of the face feature of at least part of the face image in the face sequence is determined according to the face image quality of at least part of the face image.
  • the face image quality herein includes at least one of a light intensity of a face image, a sharpness of a face image, and a face orientation.
  • device 600 includes one or more processors, communication units, etc., which may be: one or more central processing units (CPUs) 601, and/or one or more based A video face recognition unit (GPU) 613 or the like, the processor may be loaded according to executable instructions stored in read only memory (ROM) 602 or executable instructions loaded from random access memory (RAM) 603 from storage portion 608.
  • processors, communication units, etc. which may be: one or more central processing units (CPUs) 601, and/or one or more based
  • CPUs central processing units
  • GPU video face recognition unit
  • the processor may be loaded according to executable instructions stored in read only memory (ROM) 602 or executable instructions loaded from random access memory (RAM) 603 from storage portion 608.
  • ROM read only memory
  • RAM random access memory
  • the communication unit 612 may include, but is not limited to, a network card, which may include, but is not limited to, an IB (Infiniband) network card.
  • the processor can communicate with read only memory 602 and/or random access memory 630 to execute executable instructions, connect to communication portion 612 via bus 604, and communicate with other target devices via communication portion 612 to accomplish the corresponding in this application. operating.
  • RAM 603 various programs and data required for the operation of the device can be stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • ROM 602 is an optional module.
  • the RAM 603 stores executable instructions or writes executable instructions to the ROM 602 at runtime, the executable instructions causing the central processing unit 601 to perform the operations included in the video-based face recognition method described above.
  • An input/output (I/O) interface 605 is also coupled to bus 604.
  • the communication unit 612 may be integrated, or may be configured to have a plurality of sub-modules (for example, a plurality of IB network cards) and be respectively connected to the bus.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 610 as needed so that a computer program read therefrom is installed in the storage portion 608 as needed.
  • FIG. 6 is only an optional implementation manner.
  • the number and types of components in FIG. 6 may be selected, deleted, added, or replaced according to actual needs;
  • separate implementations such as separate settings or integrated settings can also be used.
  • the GPU and the CPU can be separated, and the GPU can be integrated on the CPU, and the communication unit can be separately configured or integrated.
  • the CPU or GPU etc.
  • a computer program comprising program code for performing the operations illustrated by the flowchart, the program code comprising instructions corresponding to performing the operations provided herein, for example, for a plurality of consecutive video frames appearing in the video And forming, in the plurality of video frames, a face image that meets a predetermined displacement requirement, forming an instruction of a set of face sequences, wherein the face sequence is the same person in the plurality of video frames a face image set; and, for a set of face sequences, at least according to the face features in the face sequence, using a preset face database to perform face recognition instructions.
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • the computer program is executed by the central processing unit (CPU) 601, the above-described instructions described in the present application are executed.
  • the methods and apparatus, electronic devices, and computer readable storage media of the present application are possible in many ways.
  • the methods and apparatus, electronic devices, and computer readable storage media of the present application can be implemented in software, hardware, firmware, or any combination of software, hardware, or firmware.
  • the above-described sequence of operations for the method is for illustrative purposes only, and the operation of the method of the present application is not limited to the order specifically described above unless otherwise specifically stated.
  • the present application can also be implemented as a program recorded in a recording medium, the program including machine readable instructions for implementing the method according to the present application.
  • the present application also covers a recording medium storing a program for executing the method according to the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种基于视频的人脸识别方法、装置、设备、介质及程序,其中人脸识别方法包括:针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列,其中,所述人脸序列为所述多个视频帧中属于同一个人的人脸图像集合;针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别。

Description

基于视频的人脸识别方法、装置、设备、介质及程序
本申请要求在2017年11月30日提交中国专利局、申请号为CN 201711243717.9、发明名称为“基于视频的人脸识别方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机视觉技术,尤其是涉及一种基于视频的人脸识别方法、基于视频的人脸识别装置、电子设备、计算机可读存储介质及计算机程序。
背景技术
识别视频中的人可以为多种应用提供信息支持;例如,通过对视频中的人进行识别,可以获得视频的主题人物;再例如,通过对视频中的人进行识别,可以对视频进行分类管理。
如何快速准确的识别出视频中的人,是一个值得关注的技术问题。
发明内容
本申请实施方式提供一种基于视频的人脸识别技术方案。
根据本申请实施方式其中一个方面,提供了一种基于视频的人脸识别方法,该方法主要包括:针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列,其中,所述人脸序列为所述多个视频帧中属于同一个人的人脸图像集合;针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别。
在本申请一实施方式中,所述视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像包括:同一个人的人脸图像出现在前后相邻的视频帧中,且同一个人的人脸图像在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求。
在本申请又一实施方式中,所述视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像包括:获取所述视频的连续N个视频帧中属于同一个人的人脸图像,所述N为大于二的整数;确定所述属于同一个人的人脸图像中,在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求的人脸图像对;若所述符合预定位移要求的人脸图像对,在所述属于同一个人的人脸图像的交并比满足预设比值,则所述属于同一个人的人脸图像形成一组所述人脸序列。
在本申请再一实施方式中,所述同一个人的人脸图像包括:人脸特征的相似度符合预定相似度要求的人脸图像。
在本申请再一实施方式中,所述针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列包括:针对视频中第一次出现人脸的视频帧中的至少一个人脸图像中的一个或多个人脸图像分别创建一组人脸序列;针对未出现在前一视频帧中,而出现在后一视频帧中的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列;将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中。
在本申请再一实施方式中,所述将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中包括:分别获取前后相邻的前一视频帧中的至少一个人脸图像的人脸特征、前一视频帧中的至少一个人脸图像在前一视频帧中的位置、后一视频帧中的至少一个人脸图像的人脸特征、后一视频帧中的至少一个人脸图像在后一视频帧中的位置;根据前一视频帧中的至少 一个人脸图像在前一视频帧中的位置和后一视频帧中的至少一个人脸图像在后一视频帧中的位置确定位移符合预定位移要求的人脸图像对;针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度满足预定相似度要求的情况下,确定所述人脸图像对中的后一视频帧中的人脸图像属于前一视频帧中的人脸图像所属的人脸序列。
在本申请再一实施方式中,所述针对未出现在前一视频帧中,而出现在后一视频帧中的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列包括:针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度不满足预定相似度要求的情况下,为该人脸图像对中的后一视频帧中的人脸图像创建人脸序列。
在本申请再一实施方式中,所述方法还包括:获取人脸特征,且所述获取人脸特征包括:基于人脸检测器对视频帧进行人脸检测,获得视频帧的至少一个人脸图像的外接框信息;将视频帧以及该视频帧的至少一个人脸图像的外接框信息提供给用于提取人脸特征的神经网络,经由所述神经网络获得视频帧的至少一个人脸图像的人脸特征。
在本申请再一实施方式中,所述方法在形成一组人脸序列之后,且在针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别之前,还包括:根据至少部分人脸序列的人脸特征,对至少部分人脸序列进行聚类处理,以合并对应同一人的不同人脸序列;其中,在所述聚类处理后,不同人脸序列对应不同人。
在本申请再一实施方式中,所述人脸序列的人脸特征包括:人脸序列中的至少部分人脸图像的人脸特征的加权平均值。
在本申请再一实施方式中,所述人脸序列中的至少部分人脸图像的人脸特征的权值是根据至少部分人脸图像的人脸图像质量确定的。
在本申请再一实施方式中,所述人脸图像质量包括:人脸图像的光线强度、人脸图像的清晰度以及人脸朝向中的至少一个。
在本申请再一实施方式中,所述人脸库包括:多个人的人脸特征,且针对任一人而言,该人的人脸特征包括:该人的综合人脸特征,以及该人在不同图片中的人脸特征。
在本申请再一实施方式中,所述综合人脸特征包括:人在不同图片中的人脸特征的加权平均值。
在本申请再一实施方式中,所述针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别包括:针对一组人脸序列中的至少一个人脸特征,计算该人脸特征与人脸库中至少一个人的综合人脸特征的相似度,并确定最高相似度所对应的人脸库中的人;针对所述人脸序列中的至少部分人脸特征确定的最高相似度所对应的人脸库中的人,进行投票,并将投票最多的人作为所述人脸序列所属的人;针对所述人脸序列,至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度。
在本申请再一实施方式中,所述至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度包括:针对该人脸序列中的任一人脸特征,计算该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度;根据计算出的该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度,确定该人脸序列属于该人的置信度;其中,人脸特征集合包括:人脸库中该人在不同图片中的人脸特征。
在本申请再一实施方式中,所述至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度包括:根据人脸序列中的人脸关键点与人脸特征集合中的人脸关键点确定人脸特征集合中与人脸序列中的人脸姿态最相似的人脸特征。
在本申请再一实施方式中,所述方法还包括:经由神经网络获得视频帧的至少一个人脸图像的人脸关键点。
在本申请再一实施方式中,所述至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度还包括:利用该人脸序列的人脸特征与该人脸序列所属的人的综合人脸特征的相似度,修正该人脸序列属于该人的置信度。
根据本申请实施方式的其中另一个方面,提供了一种基于视频的人脸识别装置,该装置包括:形成人脸序列模块,用于针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列,其中,所述人脸序列为所述多个视频帧中属于同一个人的人脸图像集合;人脸识别模块,用于针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别。
在本申请一实施方式中,所述装置还包括:获取人脸特征模块,用于基于人脸检测器对视频帧进行人脸检测,获得视频帧的至少一个人脸图像的外接框信息;将视频帧以及该视频帧的至少一个人脸图像的外接框信息提供给用于提取人脸特征的神经网络,经由所述神经网络获得视频帧的至少一个人脸图像的人脸特征。
在本申请又一实施方式中,所述装置还包括:人脸序列聚类模块,用于根据至少部分人脸序列的人脸特征,对至少部分人脸序列进行聚类处理,以合并对应同一个人的不同人脸序列;其中,在所述聚类处理后,不同人脸序列对应不同人,所述聚类处理后的人脸序列提供给所述人脸识别模块。
根据本申请实施方式的又一个方面,提供了一种电子设备,包括:上述任一实施方式所述的装置。
根据本申请实施方式的再一个方面,提供了一种电子设备,包括:存储器,用于存储计算机程序;处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述任一实施方式操作所述的方法。
根据本申请实施方式的再一个方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时,实现上述任一实施方式操作所述的方法。
根据本申请实施方式的再一个方面,提供了一种计算机程序,包括计算机指令,当所述计算机指令在设备的处理器中运行时,实现上述任一实施方式所述的方法。
基于本申请提供的基于视频的人脸识别方法、装置、电子设备、计算机可读存储介质及计算机程序,本申请通过利用同一个人的人脸在视频中具有时序连续性以及空间位置连续性这一特性,形成人脸序列,可以快速准确的使视频中连续出现的同一个人的人脸设置于同一组人脸序列中;从而通过利用人脸库针对基于上述方式获得的人脸序列进行人脸识别,可以快速准确的识别出视频中的人是否为人脸库中的人。
下面通过附图和实施方式,对本申请的技术方案做进一步的详细描述。
附图说明
构成说明书的一部分的附图描述了本申请的实施方式,并且连同描述一起用于解释本申请的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本申请,其中:
图1为本申请的方法一个实施例的流程图;
图2为本申请的方法另一个实施例的流程图;
图3为本申请人脸序列聚类处理的一个实施例的示意图;
图4为本申请形成人脸序列的一个实施例的流程图;
图5为本申请的装置一个实施例的结构示意图;
图6为实现本申请实施例的一示例性设备的框图。
具体实施方式
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和操作的相对布置、数字表达式和数值不限制本申请的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或者使用的任何限制。
对于相关领域普通技术人员已知的技术、方法以及设备可能不作详细讨论,但在适当情况下,所述技术、方法及设备应当被视为说明书的一部分。
应注意到:相似的标号以及字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本申请实施例可以应用于终端设备、计算机系统以及服务器等电子设备中,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统以及包括上述任何系统的分布式云计算技术环境等等。
终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
示例性实施例
下面结合图1至图6对本申请提供的基于视频的人脸识别的技术方案进行说明。
图1为本申请的方法一个实施例的流程图。如图1所示,该实施例的方法包括:操作S100以及操作S110。
S100、针对视频中的出现在连续的多个视频帧中,且在多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列。也就是说,本申请可以根据视频中的人脸图像在时序和空间位置上的连续性,形成至少一组人脸序列。人脸序列为多个视频帧中属于同一个人的人脸图像集合。
在一个可选示例中,本申请的视频可以为基于RGB形式的视频,也可以为基于其他形式的视频。另外,该视频可以是基于真实的人的视频,对应的人脸图像中的人脸可以是真实的人脸,该视频也可以是基于绘制的人的视频,对应的人脸图像中的人脸也可以是绘制的人脸,例如,该视频可以为动画片。本申请不限制视频的表现形式。
在一个可选示例中,本申请可以获取视频的连续N(N为大于2的整数)个视频帧中属于同一个人的人脸图像,并确定属于同一个人的人脸图像中,在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求的人脸图像对;如果符合预定位移要求的人脸图像对,在属于同一个人的人脸图像的交并比满足预设比值,则属于同一个人的人脸图像形成一组人脸序列。
在一个可选示例中,本申请中的预定位移要求与预设比值的关系可以是关联的,当预定位移要求比较严格时(例如,要求位移间距相对较小),预设比值的取值范围可以相对较小;当预定位移要求比较宽松时(例如,要求位移间距相对较大),预设比值的取值范围可以相对较大。
在一个可选示例中,针对人脸图像在时序和空间位置上的连续性这一方面而言,本申请中的人脸图像在时序上的连续性通常包括:同一个人的人脸图像出现在前后连续播放的至少两个视频帧中。本申请 中的人脸图像在空间位置上的连续性通常包括:同一个人的人脸图像出现在前后连续播放的两个视频帧中的基本相同的位置处。也就是说,在同一个人的人脸图像出现在前后相邻的两视频帧中,且同一个人的人脸图像在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求的情况下,可以认为人脸图像在时序和空间位置上存在连续性。上述同一个人的人脸图像可以是指人脸特征的相似度符合预定相似度要求的人脸图像。上述预定位移要求可以根据实际需求设置,例如,将前一视频帧中的位置范围扩大预定倍数(如1.1-1.5倍等),如果后一视频帧中的位置位于扩大预定倍数之后的位置范围,则认为符合预定位移要求。在一个可选示例中,本申请根据视频中的人脸图像在时序和空间位置上的连续性,通常会形成一组或者多组人脸序列,每一组人脸序列中的所有人脸图像属于同一人。每一组人脸序列通常包括一个或者多个人脸图像。在本申请对所有人脸序列进行聚类处理的情况下,在聚类处理后,不同人脸序列对应不同人。在本申请不对所有人脸序列进行聚类处理的情况下,可能会存在不同人脸序列对应同一个人的现象。包含有人脸序列聚类处理的基于视频的人脸识别的方法的流程如下述针对图2的描述。本申请通过对人脸序列进行聚类处理,可以快速准确的使视频中的一个人的所有人脸图像位于一组人脸序列中。
在一个可选示例中,本申请所形成的任一组人脸序列通常包括至少一个人脸图像的人脸特征,也可以在包括至少一个人脸图像的人脸特征的基础上,还包括至少一个人脸图像的人脸关键点。当然,本申请所形成的任一组人脸序列所包含的信息也可以为其他形式的能够唯一描述出一个人的人脸图像所具有的特点的信息。
在一个可选示例中,本申请可以采用现有的人脸特征提取技术以及现有的人脸关键点检测技术来获得视频帧中至少一个人脸图像的人脸特征以及至少一个人脸图像的人脸关键点,例如,本申请可以通过人脸检测器以及用于提取人脸特征的神经网络获得视频帧中至少一个人脸图像的人脸特征以及人脸关键点;一个可选的例子,本申请可以将视频中的至少一个视频帧提供给人脸检测器,由人脸检测器对输入的视频帧进行人脸检测,如果人脸检测器在视频帧中检测到人脸图像,则人脸检测器会输出其检测到的至少一个人脸图像的外接框信息(例如,外接框的长宽以及中心位置信息等),本申请可以根据外接框信息对相应的视频帧进行切割,从而获得相应的视频帧中至少一个人脸图像块,至少一个人脸图像块可以在调整大小后输入给用于提取人脸特征的神经网络,从而经由该神经网络可以获得相应视频帧中至少一个人脸图像的人脸特征;进一步的,本申请还可以通过该神经网络获得视频帧中至少一个人脸图像的人脸关键点,如针对一个视频帧中的一个人脸图像而言,可以获得21或68或106或186或240或220或274个人脸关键点。本申请可以利用现有的神经网络获得人脸特征以及人脸关键点,该神经网络的网络结构可根据实际需求灵活设计,本申请不限制该神经网络的网络结构。例如,该神经网络可包括但不限于卷积层、非线性Relu层、池化层、全连接层等,网络层数越多,网络越深;再例如,该神经网络的网络结构可采用但不限于ALexNet、深度残差网络(Deep Residual Network,ResNet)或VGGnet(Visual Geometry Group Network)等网络的结构。本申请不限制获得视频帧中至少一个人脸图像的人脸特征以及人脸关键点的实现方式。在一个可选示例中,本申请形成人脸序列的过程通常包括:创建人脸序列以及确定视频帧中的人脸图像所属的人脸序列。
其中创建人脸序列的一个可选例子为:在检测到视频中第一次出现人脸的视频帧时,为该视频帧中的至少一个人脸图像中的一个或多个人脸图像分别创建一组人脸序列。例如,在检测到视频中的第1-4视频帧中均未出现人脸图像,而检测到第5视频帧开始出现3个人脸图像,即第一人脸图像、第二人脸图像以及第三人脸图像,则本申请针对这3个人脸图像分别创建一组人脸序列,即第一人脸序列、第二人脸序列以及第三人脸序列。第一人脸序列可以包含第一人脸图像的人脸特征和人脸关键点,第二人脸序列可以包含第二人脸图像的人脸特征和人脸关键点,第三人脸序列可以包含第三人脸图像的人脸特征和人脸关键点。
其中创建人脸序列的另一个可选例子为:在检测到未出现在前一视频帧中,而出现在后一视频帧中的至少一个人脸图像时,为检测到的这样的一个或多个人脸图像分别创建一组人脸序列。续前例,在检测到第6视频帧中出现了5个人脸图像,即第一人脸图像、第二人脸图像、第三人脸图像、第四人脸图像以及第五人脸图像,则本申请针对第四人脸图像和第五人脸图像分别创建一组人脸序列,即第四人脸序列和第五人脸序列,第四人脸序列可以包含第四人脸图像的人脸特征和人脸关键点,第五人脸序列可以包含第五人脸图像的人脸特征和人脸关键点。再续前例,在检测到第6视频帧中出现了3个人脸图像,即第一人脸图像、第二人脸图像以及第四人脸图像,则本申请针对第四人脸图像创建一组人脸序列,即第四人脸序列,第四人脸序列可以包含第四人脸图像的人脸特征和人脸关键点。
其中确定视频帧中的人脸图像所属的人脸序列的一个可选例子为:将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中。续前例,在检测到第6视频帧中出现了5个人脸图像,即第一人脸图像、第二人脸图像、第三人脸图像、第四人脸图像以及第五人脸图像,则本申请可以将出现在第6视频帧中的与第5视频帧中的出现位置连续的第一人脸图像划归在第一人脸序列中,将出现在第6视频帧中的与第5视频帧中的出现位置连续的第二人脸图像划归在第二人脸序列中,将出现在第6视频帧中的与第5视频帧中的出现位置连续的第三人脸图像划归在第三人脸序列中,例如,本申请可以将出现在第6视频帧中的与第5视频帧中的出现位置连续的第一人脸图像的人脸特征和人脸关键点添加在第一人脸序列中,将出现在第6视频帧中的与第5视频帧中的出现位置连续的第二人脸图像的人脸特征和人脸关键点添加在第二人脸序列中,将出现在第6视频帧中的与第5视频帧中的出现位置连续的第三人脸图像的人脸特征和人脸关键点添加在第三人脸序列中。再续前例,在检测到第6视频帧中出现了3个人脸图像,即第一人脸图像、第二人脸图像及第四人脸图像,则本申请可以将出现在第6视频帧中的与第5视频帧中的出现位置连续的第一人脸图像划归在第一人脸序列中,将第6视频帧中的与第5视频帧中的出现位置连续的第二人脸图像划归在第二人脸序列中,例如,本申请可以将第6视频帧中的第一人脸图像的人脸特征和人脸关键点添加在第一人脸序列中,将第6视频帧中的第二人脸图像的人脸特征和人脸关键点添加在第二人脸序列中。
在一个可选示例中,本申请在确定视频帧中的人脸图像所属的人脸序列的过程中,需要将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中,该划归在同一个人的人脸序列的实现方式的一个可选例子为:针对视频中的任一前后相邻的两视频帧而言,首先,分别获取前一视频帧中的至少一个人脸图像的人脸特征、前一视频帧中的至少一个人脸图像在前一视频帧中的位置、后一视频帧中的至少一个人脸图像的人脸特征、后一视频帧中的至少一个人脸图像在后一视频帧中的位置;然后,根据前一视频帧中的至少一个人脸图像在前一视频帧中的位置和后一视频帧中的至少一个人脸图像在后一视频帧中的位置确定位移符合预定位移要求的人脸图像对;如果不存在位移符合预定位移要求的人脸图像对,则可以确定出需要对人脸图像对中的后一视频帧中的人脸图像创建新的人脸序列;如果存在位移符合预定位移要求的人脸图像对,则针对每一个人脸图像对,判断该人脸图像对的人脸特征对的相似度是否满足预定相似度要求,如果满足预定相似度要求,则可以确定出该人脸图像对中的后一视频帧中的人脸图像属于前一视频帧中的人脸图像所属的人脸序列;如果不满足预定相似度要求,则可以确定出需要对该人脸图像对中的后一视频帧中的人脸图像创建新的人脸序列。
本申请形成人脸序列的方式可以使视频的至少一个视频帧中的至少一个人脸图像快速准确划分在相应的人脸序列中,有利于提高人脸识别的效率以及准确性。本申请形成人脸序列的一个可选的实现过程,可以参见下述针对图4的描述。
在一个可选示例中,该操作S100可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的形成人脸序列模块500执行。
S110、针对一组人脸序列,至少根据该人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸 识别。
在一个可选示例中,本申请可以根据人脸序列的人脸特征以及该人脸序列中的至少部分人脸图像的人脸特征,利用预先设置的人脸库,进行人脸识别。也就是说,本申请中的人脸序列具有人脸特征,人脸序列的人脸特征通常是针对该人脸序列中的至少部分人脸图像的人脸特征进行综合考量而得到的人脸特征,例如,人脸序列中的至少部分人脸图像的人脸特征分别对应有权值(即至少部分人脸图像分别对应有权值),通过利用至少部分权值对人脸序列中的至少部分人脸图像的人脸特征进行加权计算,从而可以获得基于至少部分人脸图像的人脸特征的加权平均值,本申请可以将该加权平均值作为人脸序列的人脸特征。本申请中的人脸序列的人脸特征也可以称为人脸序列的综合人脸特征。
可选的,本申请可以根据人脸序列中的至少部分人脸图像的人脸图像质量来确定该人脸序列中的至少部分人脸图像的人脸特征各自对应的权值。本申请中的人脸图像质量可以包括:人脸图像的光线强度、人脸图像的清晰度以及人脸朝向中的一个或者多个;其中人脸朝向可以基于人脸关键点获得,例如,本申请可以利用现有的计算方法对人脸关键点进行计算,从而获得人脸朝向。
通常情况下,人脸图像质量越高,人脸图像对应的权值会越大;而人脸图像质量越低,人脸图像对应的权值会越小;例如,人脸图像的清晰度越高,人脸图像的光线强度越适中,人脸朝向的角度越小(即越接近正脸),则人脸图像对应的权值越大,人脸图像的清晰度越低,人脸图像的光线强度太高或者太低,人脸朝向的角度越大(即偏离正脸的角度越大),则人脸图像对应的权值越小。在人脸图像质量包括:人脸图像的光线强度、人脸图像的清晰度及人脸朝向的情况下,三者所占的权值比例可以根据实际情况确定。本申请不限制人脸图像质量与权值之间的关联关系。另外,本申请可以利用现有技术来评估人脸序列中的至少部分人脸图像的人脸图像质量。本申请不限制评估人脸图像质量的实现方式。再有,本申请也可以通过加权平均之外的其他方式获得人脸序列的人脸特征,例如,通过对人脸序列中的至少部分人脸图像的人脸特征进行平均计算,可以获得基于至少部分人脸的人脸平均特征,本申请可以将该人脸平均特征作为人脸序列的人脸特征。本申请不限制根据人脸序列中的至少部分人脸特征确定人脸序列的人脸特征的实现方式。
本申请通过根据人脸图像质量设置权值,并利用权值形成人脸序列的人脸特征,可以避免人脸序列中的质量较差的人脸特征对人脸识别所带来的不良影响,例如,完全忽略质量较差的人脸特征在确定视频的主题人物时,可能会影响主题人物确定的准确性;再例如,将质量较差的人脸特征与质量较好的人脸特征同等对待,可能会影响人脸序列的人脸特征的描述准确性。
在一个可选示例中,无论本操作中的人脸序列是否为聚类处理后的人脸序列,针对所有人脸序列中的任一组人脸序列而言,本申请均可以根据该人脸序列的人脸特征和该人脸序列中的至少部分人脸图像的人脸特征,利用预先设置的人脸库,进行人脸识别,例如,识别出人脸所属的人以及属于该人的置信度等;一个可选的例子,针对所有人脸序列中的任一组人脸序列,本申请可以根据该人脸序列的人脸特征以及该人脸序列中的至少部分人脸图像的人脸特征,确定该人脸序列对应的人是预先设置的人脸库中的人的置信度,从而根据该置信度可以确定出该人脸序列对应的人是否为人脸库中的人。本申请可以针对每一组人脸序列均判断出该人脸序列对应的人是否为人脸库中的人;本申请也可以仅针对部分人脸序列判断出人脸序列对应的人是否为人脸库中的人;本申请还可以仅针对包含人脸图像数量最多的人脸序列判断出该人脸序列对应的人是否为人脸库中的人。
一个可选例子如下:
首先,本申请预先设置有人脸库,该人脸库中包含有:多个人的人脸特征,针对人脸库中的任一人而言,人脸库中的该人的人脸特征通常包括两部分内容,其中一部分为该人的综合人脸特征,其中另一部分为该人在不同图片(如照片或者视频帧等)中的人脸特征。综合人脸特征通常是针对该人在不同图片中的人脸特征进行综合考量而得到的人脸特征,例如,该人在不同图片中的各人脸的人脸特征分别对 应有权值,通过利用各权值对该人在不同图片中的各人脸的人脸特征进行加权计算,从而可以获得基于该人在不同图片中的各人脸的人脸特征的加权平均值,本申请可以将该加权平均值作为人脸库中该人的综合人脸特征。可选的,本申请可以根据人脸在不同图片中的人脸图像质量来确定人脸在不同图片中的人脸特征各自对应的权值。上述人脸图像质量可以包括:人脸图像的光线强度、人脸图像的清晰度以及人脸朝向中的一个或者多个。通常情况下,人脸图像质量越高,人脸图像对应的权值就越大;而人脸图像质量越低,人脸图像对应的权值就越小。本申请不限制人脸图像质量与权值之间的关联关系。另外,本申请可以利用现有技术来评估人脸库中的人脸在不同图片中的人脸图像质量。本申请不限制评估人脸图像质量的实现方式。再有,本申请也可以通过加权平均之外的其他方式获得人脸库中各人的综合人脸特征,例如,通过针对人脸库中的一个人的人脸在不同图片中的人脸特征进行平均计算,可以获得基于该人的人脸平均特征,本申请可以将该人脸平均特征作为人脸库中该人的综合人脸特征。本申请不限制确定人脸库中任一人的综合人脸特征的实现方式。还有,本申请的人脸库还包括:人脸在不同图片中的人脸关键点。可选的,确定综合人脸特征的方法可以与确定人脸序列的人脸特征的方法相同。
其次,针对一组人脸序列而言(例如,针对任一组人脸序列或者针对包含人脸图像数量最多的人脸序列而言),计算该人脸序列中的至少一个人脸特征(例如所有人脸特征)分别与人脸库中至少一个人的综合人脸特征的相似度,并针对人脸序列中的至少一个人脸特征(例如每一个人脸特征)选取出最高相似度,从而利用最高相似度对应的人脸库中的人进行投票,本申请可以根据投票结果确定出该人脸序列所属的人。一个可选例子,针对所有人脸序列中的任一组人脸序列而言,计算该人脸序列中的第一个人脸特征分别与人脸库中第一人的综合人脸特征的相似度、第二人的综合人脸特征的相似度……最后一人(例如第N人)的综合人脸特征的相似度,从而获得N个相似度,从这N个相似度中挑选出最高相似度;计算该人脸序列中的第二个人脸特征分别与人脸库中第一人的综合人脸特征的相似度、第二人的综合人脸特征的相似度……最后一人的综合人脸特征的相似度,从而再次获得N个相似度,从这N个相似度中挑选出最高相似度;以此类推,计算该人脸序列中的第M个(例如最后一个)人脸特征分别与人脸库中第一人的综合人脸特征的相似度、第二人的综合人脸特征的相似度……最后一人的综合人脸特征的相似度,从而获得N个相似度,从这N个相似度中挑选出最高相似度;由此本申请可以获得M个最高相似度;本申请利用M个最高相似度对应的人脸库中的人进行投票,例如,M-1(M>2)个最高相似度均对应人脸库中的第一人,而只有一个最高相似度对应人脸库中的第二人,则本申请可以根据本次投票结果确定出该人脸序列属于人脸库中的第一人。本申请不限制计算人脸序列中的人脸特征与综合人脸特征之间的相似度的实现方式。
再次,本申请可以根据该人脸序列中的至少一个人脸特征和该人脸序列属于人脸库中的人的人脸在不同图片中的人脸特征确定出该人脸序列属于人脸库中的该人的置信度;一个可选例子,为了便于描述,本申请可以将人脸序列属于人脸库中的人的人脸在不同图片中的人脸特征看作一个人脸特征集合,针对该人脸序列中的每一个人脸图像的人脸特征而言,本申请分别从人脸特征集合中查找与该人脸特征的人脸姿态最相似的人脸特征,人脸序列中的一个人脸特征和人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征形成一个人脸特征对,计算人脸特征对中的两个人脸特征之间的相似度(例如,置信度),从而本申请可以针对人脸序列中的一个或多个人脸特征分别获得一个相似度,从而本申请可以根据针对该人脸序列所计算出的所有相似度确定出该人脸序列属于人脸库中的该人的置信度。另外,本申请可以利用该人脸序列的人脸特征对该置信度进行校正;一个可选例子,,本申请可以计算该人脸序列的人脸特征与该人脸序列属于人脸库中的人的综合人脸特征的相似度,在判断出该相似度小于上述置信度时,可以将该相似度作为该人脸序列属于人脸库中的该人的置信度,而在判断出该相似度不小于上述置信度时,不利用该相似度对该人脸序列属于人脸库中的该人的置信度进行更新。
本申请通过利用人脸序列中的至少一个人脸特征和人脸特征集合中与其人脸姿态最相似的人脸特 征来确定置信度,可以避免人脸的姿态差异对置信度计算准确性的影响;本申请通过计算人脸序列的人脸特征和该人脸序列属于人脸库中的人的综合人脸特征的相似度,并利用该相似度对确定出的置信度进行校正,可以避免人脸序列中的各人脸由于人脸朝向过于单一(如均为左侧朝向的人脸等)而导致在判断人脸序列属于人脸库中的人时发生错误的现象;从而有利于提高人脸识别的准确性。
在一个可选示例中,本申请可以利用人脸关键点确定出人脸的姿态,从而可以确定出人脸特征集合中与人脸序列中的一人脸姿态最相似的人脸特征,例如,人脸序列中包括每一个人脸的人脸特征和人脸关键点(下述称为第一人脸关键点),人脸库中包括人在不同图片中的人脸特征和人脸关键点(下述称为第二人脸关键点),本申请可以将人脸序列中的一个人脸的第一人脸关键点映射到标准空白图像中,并将人脸库中的人在不同图片中的第二人脸关键点也分别映射到标准空白图像中,从而通过比较第一人脸关键点中的至少一个关键点与至少一个第二人脸关键点中的至少一个关键点之间的位置关系,就可以根据比较结果从人脸特征集合中选择出姿态最相似的人脸特征。本申请中的人脸姿态通常可以表现出人脸朝向以及面部表情等,人脸姿态通常可以由人脸关键点决定,本申请可以将人脸姿态看做是人脸关键点的物理意义。
本申请可以根据最终确定的置信度判断视频中的任一人脸是否属于人脸库中的相应的人。本申请的判断结果可以应用于多种应用中,例如,确定视频主题人物、确定视频中的所有人物或者确定与某一人物相关的视频等,从而本申请可以实现对视频的自动管理。
在一个可选示例中,该操作S110可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的人脸识别模块510执行。
图2为本申请的方法另一个实施例的流程图。如图2所示,该实施例的方法主要包括:操作S200、操作S210、操作S220、操作S230、操作S240、操作S250以及操作S260。
S200、针对视频中的出现在连续的多个视频帧中,且在多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列。也就是说,本申请可以根据视频中的人脸在时序和空间位置上的连续性,形成至少一组人脸序列。
在一个可选示例中,每一组人脸序列均具有一个人脸特征,该人脸特征即为人脸序列的人脸特征;同时,每一组人脸序列中通常包括一个或者多个人脸图像的人脸特征。每一组人脸序列中的所有人脸特征属于同一个人。
在一个可选示例中,该操作S200可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的形成人脸序列模块500执行。
S210、对至少部分人脸序列进行聚类处理,以合并至少部分人脸序列中对应同一个人的不同人脸序列。
如图3所示,每一个圆圈表示一组人脸序列,视频中的第一主角和第二主角的人脸可能会由于其在时序或者空间位置上的间断,而形成11组人脸序列,如图3左侧的11个圆圈。这11组人脸序列在进行聚类处理过程中,会被分成两类,即图3右侧上部的6组人脸序列形成的一类以及图3右侧下部的5组人脸序列形成的另一类,本申请可以将图3右侧上部的6组人脸序列合并为一组人脸序列,并将图3右侧下部的5组人脸序列合并为另一组人脸序列。
在一个可选示例中,本申请可以根据至少部分人脸序列的人脸特征进行聚类处理。本申请不限制聚类处理的实现方式。在聚类处理后,合并后的人脸序列仍具有人脸特征,该人脸特征可以是由相互合并的至少部分人脸序列中的所有人脸特征基于权值计算获得的,也可以是由相互合并的至少部分人脸序列的人脸特征进行计算获得的。合并后的人脸序列包括相互合并的至少部分人脸序列中的所有人脸特征。
在一个可选示例中,该操作S210可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的人脸序列聚类模块530执行。
S220、针对聚类处理后的至少一组人脸序列中的一组人脸序列,计算该人脸序列中的至少一个人脸特征与预先设置的人脸库中的至少一个人脸特征的相似度。
在一个可选示例中,本申请可以从聚类处理后的所有人脸序列中选取出包含人脸数量最多的人脸序列,并计算选取出的人脸序列中的至少部分人脸特征分别与预先设置的人脸库中的至少部分人的人脸特征的相似度。
S230、针对一组人脸序列中的至少一个人脸特征,从计算出的该人脸特征的至少一个相似度中选取最高相似度,并针对至少部分最高相似度对应的人脸库中的人进行投票,根据投票结果确定该人脸序列所属的人脸库中的人。人脸序列所属的人脸库中的人在人脸库中的所有图片形成一人脸特征集合。
S240、从该人脸序列中获取一待处理人脸的人脸关键点,根据该待处理人脸的人脸关键点和人脸特征集合中至少一个图片中的人脸关键点,确定人脸特征集合中与该待处理人脸姿态最相似的人脸特征,并根据待处理人脸的人脸特征和姿态最相似的人脸特征计算两者之间的相似度。
S250、判断人脸序列中是否还存在待处理人脸,如果还存在待处理人脸,则执行操作S240,否则,执行操作S260。
S260、根据上述计算出的相似度确定该人脸序列属于人脸库中的人的置信度,例如,计算所有相似度的平均值,并将平均值作为该置信度。另外,本申请还可以计算该人脸序列的人脸特征与该人脸序列属于人脸库中的人的综合人脸特征之间的相似度,并判断该相似度与上述置信度之间的大小,如果该相似度小于上述置信度,则可以将该相似度作为该人脸序列属于人脸库中的人的置信度;否则,不针对上述确定出的置信度进行修正。
在一个可选示例中,该操作S220、S230、S240、S250和S260可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的人脸识别模块520执行。
图4为本申请形成人脸序列的一个实施例的流程图。如图4所示,该实施例的方法包括如下操作:
S400、本申请形成人脸序列的流程开始,执行初始化操作,对前一帧人脸位置以及人脸特征进行初始化,例如,将前一帧人脸位置和人脸特征分别初始化为空。
S410、判断视频中当前是否存在未被读取的视频帧,如果当前存在未被读取的视频帧,则执行操作S420;如果当前不存在未被读取的视频帧,则执行操作S480。
S420、按照视频的播放时间顺序,从视频中顺序读取一视频帧,并利用人脸检测器对该视频帧进行人脸检测。
S430、判断是否检测到人脸图像,如果检测到人脸图像,则执行操作S440,如果没有检测到人脸图像,则返回操作S410。
S440、利用神经网络获取检测到的视频帧中的至少一个人脸图像的人脸特征以及人脸关键点,并将检测到的至少一个人脸图像的人脸位置与前一帧人脸位置进行比较,对于本次检测到的位移符合预定位移要求的至少一个人脸图像,执行操作S450相应的处理,对于本次检测到的位移不符合预定位移要求的至少一个人脸图像,执行操作S441相应的处理。
S441、针对位移不符合预定位移要求的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列,并将位移不符合预定位移要求的至少一个人脸图像的人脸特征和人脸关键点分别添加在新创建的相应人脸序列中。执行操作S460。
S450、根据位移符合预定要求的至少一个人脸特征以及前一帧中相应的人脸特征计算人脸特征的相似度,对于相似度满足预定相似度要求的人脸图像,执行操作S460相应的处理;对于相似度不满足预定相似度要求的人脸图像,执行操作S451相应的处理。
S451、针对相似度不满足预定相似度要求的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列,并将相似度不满足预定相似度要求的至少一个人脸的人脸特征和人脸关键点分别添加 在新创建的相应人脸序列中。执行操作S470。
S460、针对相似度满足预定相似度要求的至少一个人脸图像,将至少一个人脸图像的人脸特征以及人脸关键点分别添加到前一帧相应人脸所在的人脸序列中。执行操作S470。
S470、根据本次检测到的至少一个人脸图像的人脸位置以及至少一个人脸图像的人脸特征更新前一帧人脸位置以及人脸特征。返回操作S410。
S480、本申请形成人脸序列的流程结束。
本申请实施例提供的任一种基于视频的人脸识别方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种基于视频的人脸识别方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种基于视频的人脸识别方法。下文不再赘述。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分操作可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的操作;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等至少一个种可以存储程序代码的介质。
图5为本申请的装置一个实施例的结构示意图。如图5所示,该实施例的装置主要包括:形成人脸序列模块500以及人脸识别模块510。可选的,该装置还可以包括:获取人脸特征模块520以及人脸序列聚类模块530中的至少一个。
形成人脸序列模块500主要用于针对视频中的出现在连续的多个视频帧中,且在多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列;其中,人脸序列为多个视频帧中属于同一个人的人脸图像集合。
可选的,形成人脸序列模块500可以获取视频的连续N(N为大于2的整数)个视频帧中属于同一个人的人脸图像,形成人脸序列模块500确定属于同一个人的人脸图像中,在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求的人脸图像对;若符合预定位移要求的人脸图像对,在属于同一个人的人脸图像的交并比满足预设比值,则形成人脸序列模块500将属于同一个人的人脸图像形成一组人脸序列。
可选的,如果同一个人的人脸图像出现在前后相邻的视频帧中,且同一个人的人脸图像在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求,则形成人脸序列模块500可以将这样的人脸图像设置于同一组人脸序列中。
上述同一个人的人脸图像包括:人脸特征的相似度符合预定相似度要求的人脸图像。
在一个可选示例中,形成人脸序列模块500可以针对视频中第一次出现人脸的视频帧中的至少一个人脸图像中的一个或多个人脸图像分别创建一组人脸序列。形成人脸序列模块500可以针对未出现在前一视频帧中,而出现在后一视频帧中的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列。形成人脸序列模块500还可以将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中。
在一个可选示例中,形成人脸序列模块500分别获取前后相邻的前一视频帧中的至少一个人脸图像的人脸特征、前一视频帧中的至少一个人脸图像在前一视频帧中的位置、后一视频帧中的至少一个人脸图像的人脸特征、后一视频帧中的至少一个人脸图像在后一视频帧中的位置;形成人脸序列模块500根据前一视频帧中的至少一个人脸图像在前一视频帧中的位置和后一视频帧中的至少一个人脸图像在后一视频帧中的位置确定位移符合预定位移要求的人脸图像对;形成人脸序列模块500针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度满足预定相似度要求的情况下,确定人脸图像对中的后一视频帧中的人脸图像属于前一视频帧中的人脸图像所属的人脸序列。
在一个可选示例中,形成人脸序列模块500针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度不满足预定相似度要求的情况下,为该人脸图像对中的后一视频帧中的人脸图像创建人脸序列。
形成人脸序列模块500所使用的人脸特征是由获取人脸特征模块520提供的。获取人脸特征模块520主要用于基于人脸检测器对视频帧进行人脸检测,获得视频帧的至少一个人脸图像的外接框信息,并将视频帧以及该视频帧的至少一个人脸图像的外接框信息提供给用于提取人脸特征的神经网络,经由神经网络获得视频帧的至少一个人脸图像的人脸特征。获取人脸特征模块520还可以经由该神经网络获得视频帧的至少一个人脸图像的人脸关键点。
人脸识别模块510主要用于针对一组人脸序列,至少根据该人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别。
在一个可选示例中,本申请中的人脸库包括:多个人的人脸特征,且针对任一人而言,该人的人脸特征包括:该人的综合人脸特征,以及该人在不同图片中的人脸特征。上述综合人脸特征包括:人在不同图片中的人脸特征的加权平均值。
在一个可选示例中,首先,人脸识别模块510可以针对一组人脸序列中的至少一个人脸特征,计算该人脸特征与人脸库中至少一个人的综合人脸特征的相似度,并确定最高相似度所对应的人脸库中的人;其次,人脸识别模块510可以针对该人脸序列中的至少部分人脸特征确定的最高相似度所对应的人脸库中的人,进行投票,并将投票最多的人作为该人脸序列所属的人;之后,人脸识别模块510可以针对该人脸序列,至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度;例如,人脸识别模块510可以针对该人脸序列中的至少一个人脸特征,计算该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度,并根据计算出的该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度,确定该人脸序列属于该人的置信度;其中的人脸特征集合可以包括:人脸库中该人在不同图片中的人脸特征。
另外,人脸识别模块510可以根据人脸序列中的人脸关键点与人脸特征集合中的人脸关键点确定人脸特征集合中与人脸序列中的人脸姿态最相似的人脸特征。还有,人脸识别模块510可以利用该人脸序列的人脸特征与该人脸序列所属的人的综合人脸特征的相似度,修正该人脸序列属于该人的置信度。
人脸序列聚类模块530主要用于根据至少部分人脸序列的人脸特征,对至少部分人脸序列进行聚类处理,以合并对应同一个人的不同人脸序列;其中,在聚类处理后,不同人脸序列对应不同人,聚类处理后的人脸序列提供给人脸识别模块510。本申请中的人脸序列的人脸特征可以为人脸序列中的至少部分人脸图像的人脸特征的加权平均值。其中,人脸序列中的至少部分人脸图像的人脸特征的权值是根据至少部分人脸图像的人脸图像质量确定的。这里的人脸图像质量包括:人脸图像的光线强度、人脸图像的清晰度以及人脸朝向中的至少一个。
示例性设备
图6示出了适于实现本申请的示例性设备600,设备600可以是汽车中配置的控制系统/电子系统、移动终端(例如,智能移动电话等)、个人计算机(PC,例如,台式计算机或者笔记型计算机等)、平板电脑以及服务器等。图6中,设备600包括一个或者多个处理器、通信部等,所述一个或者多个处理器可以为:一个或者多个中央处理单元(CPU)601,和/或,一个或者多个基于视频的人脸识别器(GPU)613等,处理器可以根据存储在只读存储器(ROM)602中的可执行指令或者从存储部分608加载到随机访问存储器(RAM)603中的可执行指令而执行各种适当的动作和处理。通信部612可以包括但不限于网卡,所述网卡可以包括但不限于IB(Infiniband)网卡。处理器可与只读存储器602和/或随机访问存储器630中通信以执行可执行指令,通过总线604与通信部612相连、并经通信部612与其他目标设 备通信,从而完成本申请中的相应操作。
上述各指令所执行的操作可以参见上述方法实施例中的相关描述,在此不再详细说明。
此外,在RAM 603中,还可以存储有装置操作所需的各种程序以及数据。CPU601、ROM602以及RAM603通过总线604彼此相连。在有RAM603的情况下,ROM602为可选模块。RAM603存储可执行指令,或在运行时向ROM602中写入可执行指令,可执行指令使中央处理单元601执行上述基于视频的人脸识别方法所包括的操作。输入/输出(I/O)接口605也连接至总线604。通信部612可以集成设置,也可以设置为具有多个子模块(例如,多个IB网卡),并分别与总线连接。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装在存储部分608中。
需要特别说明的是,如图6所示的架构仅为一种可选实现方式,在实践过程中,可根据实际需要对上述图6的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如,GPU和CPU可分离设置,再如理,可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上等。这些可替换的实施方式均落入本申请的保护范围。
特别地,根据本申请的实施方式,下文参考流程图描述的过程可以被实现为计算机软件程序,例如,本申请实施方式包括一种计算机程序产品,其包含有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的操作的程序代码,程序代码可包括对应执行本申请提供的操作对应的指令,例如,用于针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列的指令,其中,所述人脸序列为所述多个视频帧中属于同一人的人脸图像集合;以及,用于针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别的指令。
在这样的实施方式中,该计算机程序可以通过通信部分609从网络上被下载及安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请中记载的上述指令。
可能以许多方式来实现本申请的方法和装置、电子设备以及计算机可读存储介质。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本申请的方法和装置、电子设备以及计算机可读存储介质。用于方法的操作的上述顺序仅是为了进行说明,本申请的方法的操作不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施方式中,还可将本申请实施为记录在记录介质中的程序,这些程序包括用于实现根据本申请的方法的机器可读指令。因而,本申请还覆盖存储用于执行根据本申请的方法的程序的记录介质。
本申请的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本申请限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施方式是为了更好说明本申请的原理和实际应用,并且使本领域的普通技术人员能够理解本申请从而设计适于特定用途的带有各种修改的各种实施方式。

Claims (42)

  1. 一种基于视频的人脸识别方法,其特征在于,所述方法包括:
    针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列,其中,所述人脸序列为所述多个视频帧中属于同一个人的人脸图像集合;
    针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别。
  2. 根据权利要求1所述的方法,其特征在于,所述视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像包括:
    同一个人的人脸图像出现在前后相邻的视频帧中,且同一个人的人脸图像在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求。
  3. 根据权利要求1至2中任一项所述的方法,其特征在于,所述视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像包括:
    获取所述视频的连续N个视频帧中属于同一个人的人脸图像,所述N为大于二的整数;
    确定所述属于同一个人的人脸图像中,在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求的人脸图像对;
    若所述符合预定位移要求的人脸图像对,在所述属于同一个人的人脸图像的交并比满足预设比值,则所述属于同一个人的人脸图像形成一组所述人脸序列。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述同一个人的人脸图像包括:人脸特征的相似度符合预定相似度要求的人脸图像。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列包括:
    针对视频中第一次出现人脸的视频帧中的至少一个人脸图像中的一个或多个人脸图像分别创建一组人脸序列;
    针对未出现在前一视频帧中,而出现在后一视频帧中的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列;
    将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中。
  6. 根据权利要求5所述的方法,其特征在于,所述将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中包括:
    分别获取前后相邻的前一视频帧中的至少一个人脸图像的人脸特征、前一视频帧中的至少一个人脸图像在前一视频帧中的位置、后一视频帧中的至少一个人脸图像的人脸特征、后一视频帧中的至少一个人脸图像在后一视频帧中的位置;
    根据前一视频帧中的至少一个人脸图像在前一视频帧中的位置和后一视频帧中的至少一个人脸图像在后一视频帧中的位置确定位移符合预定位移要求的人脸图像对;
    针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度满足预定相似度要求的情况下,确定所述人脸图像对中的后一视频帧中的人脸图像属于前一视频帧中的人脸图像所属的人脸序列。
  7. 根据权利要求5或6所述的方法,其特征在于,所述针对未出现在前一视频帧中,而出现在后一视频帧中的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列包括:
    针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度不满足预定相似度要求的情况下,为该人脸图像对中的后一视频帧中的人脸图像创建人脸序列。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述方法还包括:获取人脸特征,且 所述获取人脸特征包括:
    基于人脸检测器对视频帧进行人脸检测,获得视频帧的至少一个人脸图像的外接框信息;
    将视频帧以及该视频帧的至少一个人脸图像的外接框信息提供给用于提取人脸特征的神经网络,经由所述神经网络获得视频帧的至少一个人脸图像的人脸特征。
  9. 根据权利要求1至8中任一项所述的方法,其特征在于,所述方法在形成一组人脸序列之后,且在针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别之前,还包括:
    根据至少部分人脸序列的人脸特征,对至少部分人脸序列进行聚类处理,以合并对应同一个人的不同人脸序列;
    其中,在所述聚类处理后,不同人脸序列对应不同人。
  10. 根据权利要求9所述的方法,其特征在于,所述人脸序列的人脸特征包括:
    人脸序列中的至少部分人脸图像的人脸特征的加权平均值。
  11. 根据权利要求10所述的方法,其特征在于,所述人脸序列中的至少部分人脸图像的人脸特征的权值是根据至少部分人脸图像的人脸图像质量确定的。
  12. 根据权利要求11所述的方法,其特征在于,所述人脸图像质量包括:人脸图像的光线强度、人脸图像的清晰度以及人脸朝向中的至少一个。
  13. 根据权利要求1至12中任一项所述的方法,其特征在于,所述人脸库包括:多个人的人脸特征,且针对任一人而言,该人的人脸特征包括:该人的综合人脸特征,以及该人在不同图片中的人脸特征。
  14. 根据权利要求13所述的方法,其特征在于,所述综合人脸特征包括:人在不同图片中的人脸特征的加权平均值。
  15. 根据权利要求13至14中任一项所述的方法,其特征在于,所述针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别包括:
    针对一组人脸序列中的至少一个人脸特征,计算该人脸特征与人脸库中至少一个人的综合人脸特征的相似度,并确定最高相似度所对应的人脸库中的人;
    针对所述人脸序列中的至少部分人脸特征确定的最高相似度所对应的人脸库中的人,进行投票,并将投票最多的人作为所述人脸序列所属的人;
    针对所述人脸序列,至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度。
  16. 根据权利要求15所述的方法,其特征在于,所述至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度包括:
    针对该人脸序列中的至少一个人脸特征,计算该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度;
    根据计算出的该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度,确定该人脸序列属于该人的置信度;
    其中,人脸特征集合包括:人脸库中该人在不同图片中的人脸特征。
  17. 根据权利要求15或16所述的方法,其特征在于,所述至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度包括:
    根据人脸序列中的人脸关键点与人脸特征集合中的人脸关键点确定人脸特征集合中与人脸序列中 的人脸姿态最相似的人脸特征。
  18. 根据权利要求1-17中任一项所述的方法,其特征在于,所述方法还包括:
    经由神经网络获得视频帧的至少一个人脸图像的人脸关键点。
  19. 根据权利要求16至18中任一项所述的方法,其特征在于,所述至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度还包括:
    利用该人脸序列的人脸特征与该人脸序列所属的人的综合人脸特征的相似度,修正该人脸序列属于该人的置信度。
  20. 一种基于视频的人脸识别装置,其特征在于,包括:
    形成人脸序列模块,用于针对视频中的出现在连续的多个视频帧中,且在所述多个视频帧中的位置符合预定位移要求的人脸图像,形成一组人脸序列,其中,所述人脸序列为所述多个视频帧中属于同一个人的人脸图像集合;
    人脸识别模块,用于针对一组人脸序列,至少根据所述人脸序列中的人脸特征,利用预先设置的人脸库,进行人脸识别。
  21. 根据权利要求20所述的装置,其特征在于,所述形成人脸序列模块,用于根据同一个人的人脸图像出现在前后相邻的视频帧中,且同一个人的人脸图像在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求,形成一组人脸序列。
  22. 根据权利要求20至21中任一项所述的装置,其特征在于,所述形成人脸序列模块,用于:
    获取所述视频的连续N个视频帧中属于同一个人的人脸图像,所述N为大于二的整数;
    确定所述属于同一个人的人脸图像中,在前一视频帧中的位置与其在后一视频帧中的位置的位移符合预定位移要求的人脸图像对;
    若所述符合预定位移要求的人脸图像对,在所述属于同一个人的人脸图像的交并比满足预设比值,则所述属于同一个人的人脸图像形成一组所述人脸序列。
  23. 根据权利要求20至22中任一项所述的装置,其特征在于,所述同一个人的人脸图像包括:人脸特征的相似度符合预定相似度要求的人脸图像。
  24. 根据权利要求20至23中任一项所述的装置,其特征在于,所述形成人脸序列模块,用于:
    针对视频中第一次出现人脸的视频帧中的至少一个人脸图像中的一个或多个人脸图像分别创建一组人脸序列;
    针对未出现在前一视频帧中,而出现在后一视频帧中的至少一个人脸图像中的一个或多个人脸图像,分别创建一组人脸序列;
    将出现在前一视频帧和后一视频帧中的空间位置连续的同一个人的人脸图像,划归在同一个人的人脸序列中。
  25. 根据权利要求24所述的装置,其特征在于,所述形成人脸序列模块,用于:
    分别获取前后相邻的前一视频帧中的至少一个人脸图像的人脸特征、前一视频帧中的至少一个人脸图像在前一视频帧中的位置、后一视频帧中的至少一个人脸图像的人脸特征、后一视频帧中的至少一个人脸图像在后一视频帧中的位置;
    根据前一视频帧中的至少一个人脸图像在前一视频帧中的位置和后一视频帧中的至少一个人脸图像在后一视频帧中的位置确定位移符合预定位移要求的人脸图像对;
    针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度满足预定相似度要求的情况下,确定所述人脸图像对中的后一视频帧中的人脸图像属于前一视频帧中的人脸图像所属的人脸序列。
  26. 根据权利要求24或25所述的装置,其特征在于,所述形成人脸序列模块,用于针对位移符合预定位移要求的人脸图像对,在确定该人脸图像对的人脸特征对的相似度不满足预定相似度要求的情况下,为该人脸图像对中的后一视频帧中的人脸图像创建人脸序列。
  27. 根据权利要求20至26中任一项所述的装置,其特征在于,所述装置还包括:获取人脸特征模块,用于:
    基于人脸检测器对视频帧进行人脸检测,获得视频帧的至少一个人脸图像的外接框信息;
    将视频帧以及该视频帧的至少一个人脸图像的外接框信息提供给用于提取人脸特征的神经网络,经由所述神经网络获得视频帧的至少一个人脸图像的人脸特征。
  28. 根据权利要求20至27中任一项所述的装置,其特征在于,所述装置还包括:
    人脸序列聚类模块,用于根据至少部分人脸序列的人脸特征,对至少部分人脸序列进行聚类处理,以合并对应同一个人的不同人脸序列;
    其中,在所述聚类处理后,不同人脸序列对应不同人,所述聚类处理后的人脸序列提供给所述人脸识别模块。
  29. 根据权利要求28所述的装置,其特征在于,所述人脸序列的人脸特征包括:
    人脸序列中的至少部分人脸图像的人脸特征的加权平均值。
  30. 根据权利要求29所述的装置,其特征在于,所述人脸序列中的至少部分人脸图像的人脸特征的权值是根据至少部分人脸图像的人脸图像质量确定的。
  31. 根据权利要求30所述的装置,其特征在于,所述人脸图像质量包括:人脸图像的光线强度、人脸图像的清晰度以及人脸朝向中的至少一个。
  32. 根据权利要求20至31中任一项所述的装置,其特征在于,所述人脸库包括:多个人的人脸特征,且针对任一人而言,该人的人脸特征包括:该人的综合人脸特征,以及该人在不同图片中的人脸特征。
  33. 根据权利要求32所述的装置,其特征在于,所述综合人脸特征包括:人在不同图片中的人脸特征的加权平均值。
  34. 根据权利要求32至33中任一项所述的装置,其特征在于,所述人脸识别模块,用于:
    针对一组人脸序列中的至少一个人脸特征,计算该人脸特征与人脸库中至少一个人的综合人脸特征的相似度,并确定最高相似度所对应的人脸库中的人;
    针对所述人脸序列中的至少部分人脸特征确定的最高相似度所对应的人脸库中的人,进行投票,并将投票最多的人作为所述人脸序列所属的人;
    针对所述人脸序列,至少根据该人脸序列中的至少一个人脸特征和该人脸序列所属的人在人脸库中的不同图片中的人脸特征的相似度,确定该人脸序列属于该人的置信度。
  35. 根据权利要求34所述的装置,其特征在于,所述人脸识别模块,用于:
    针对该人脸序列中的任一人脸特征,计算该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度;
    根据计算出的该人脸特征与人脸特征集合中与该人脸特征的人脸姿态最相似的人脸特征的相似度,确定该人脸序列属于该人的置信度;
    其中,人脸特征集合包括:人脸库中该人在不同图片中的人脸特征。
  36. 根据权利要求34或35所述的装置,其特征在于,所述人脸识别模块,用于根据人脸序列中的人脸关键点与人脸特征集合中的人脸关键点确定人脸特征集合中与人脸序列中的人脸姿态最相似的人脸特征。
  37. 根据权利要求20至36中任一项所述的装置,其特征在于,经由神经网络获得视频帧的至少一 个人脸图像的人脸关键点。
  38. 根据权利要求35至37中任一项所述的装置,其特征在于,所述人脸识别模块,还用于利用该人脸序列的人脸特征与该人脸序列所属的人的综合人脸特征的相似度,修正该人脸序列属于该人的置信度。
  39. 一种电子设备,其特征在于,包括:权利要求20-38中任一项所述的装置。
  40. 一种电子设备,其特征在于,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述存储器中存储的计算机程序,且所述计算机程序被执行时,实现上述权利要求1-19中任一项所述的方法。
  41. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时,实现上述权利要求1-19中任一项所述的方法。
  42. 一种计算机程序,包括计算机指令,其特征在于,当所述计算机指令在设备的处理器中运行时,实现上述权利要求1-19中任一项所述的方法。
PCT/CN2018/117662 2017-11-30 2018-11-27 基于视频的人脸识别方法、装置、设备、介质及程序 WO2019105337A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/455,173 US11068697B2 (en) 2017-11-30 2019-06-27 Methods and apparatus for video-based facial recognition, electronic devices, and storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711243717.9 2017-11-30
CN201711243717.9A CN108229322B (zh) 2017-11-30 2017-11-30 基于视频的人脸识别方法、装置、电子设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/455,173 Continuation US11068697B2 (en) 2017-11-30 2019-06-27 Methods and apparatus for video-based facial recognition, electronic devices, and storage media

Publications (1)

Publication Number Publication Date
WO2019105337A1 true WO2019105337A1 (zh) 2019-06-06

Family

ID=62653718

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/117662 WO2019105337A1 (zh) 2017-11-30 2018-11-27 基于视频的人脸识别方法、装置、设备、介质及程序

Country Status (3)

Country Link
US (1) US11068697B2 (zh)
CN (1) CN108229322B (zh)
WO (1) WO2019105337A1 (zh)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229322B (zh) 2017-11-30 2021-02-12 北京市商汤科技开发有限公司 基于视频的人脸识别方法、装置、电子设备及存储介质
CN109190449A (zh) * 2018-07-09 2019-01-11 北京达佳互联信息技术有限公司 年龄识别方法、装置、电子设备及存储介质
CN109063580A (zh) * 2018-07-09 2018-12-21 北京达佳互联信息技术有限公司 人脸识别方法、装置、电子设备及存储介质
CN109063611B (zh) * 2018-07-19 2021-01-05 北京影谱科技股份有限公司 一种基于视频语义的人脸识别结果处理方法和装置
CN110866428B (zh) * 2018-08-28 2023-12-15 杭州海康威视数字技术股份有限公司 目标跟踪方法、装置、电子设备及存储介质
CN110874547B (zh) * 2018-08-30 2023-09-12 富士通株式会社 从视频中识别对象的方法和设备
CN109389691B (zh) * 2018-09-29 2021-05-14 爱笔(北京)智能科技有限公司 基于生物信息的票务处理方法及系统、服务端、客户端
CN109241345B (zh) * 2018-10-10 2022-10-14 百度在线网络技术(北京)有限公司 基于人脸识别的视频定位方法和装置
CN109167935A (zh) * 2018-10-15 2019-01-08 Oppo广东移动通信有限公司 视频处理方法和装置、电子设备、计算机可读存储介质
CN109543641B (zh) * 2018-11-30 2021-01-26 厦门市美亚柏科信息股份有限公司 一种实时视频的多目标去重方法、终端设备及存储介质
CN109753917A (zh) * 2018-12-29 2019-05-14 中国科学院重庆绿色智能技术研究院 人脸质量寻优方法、系统、计算机可读存储介质及设备
CN110263744B (zh) * 2019-06-26 2021-05-11 苏州万店掌网络科技有限公司 提高无感人脸识别率的方法
US11126830B2 (en) * 2019-09-17 2021-09-21 Verizon Media Inc. Computerized system and method for adaptive stranger detection
CN110852269B (zh) * 2019-11-11 2022-05-20 青岛海信网络科技股份有限公司 一种基于特征聚类的跨镜头人像关联分析方法及装置
CN110874583A (zh) * 2019-11-19 2020-03-10 北京精准沟通传媒科技股份有限公司 一种客流统计的方法、装置、存储介质及电子设备
CN110991281B (zh) * 2019-11-21 2022-11-04 电子科技大学 一种动态人脸识别方法
CN111079670B (zh) * 2019-12-20 2023-11-03 北京百度网讯科技有限公司 人脸识别方法、装置、终端和介质
US11687778B2 (en) 2020-01-06 2023-06-27 The Research Foundation For The State University Of New York Fakecatcher: detection of synthetic portrait videos using biological signals
US11425317B2 (en) * 2020-01-22 2022-08-23 Sling Media Pvt. Ltd. Method and apparatus for interactive replacement of character faces in a video device
CN111401315B (zh) * 2020-04-10 2023-08-22 浙江大华技术股份有限公司 基于视频的人脸识别方法、识别装置及存储装置
CN111738120B (zh) * 2020-06-12 2023-12-05 北京奇艺世纪科技有限公司 人物识别方法、装置、电子设备及存储介质
CN112084857A (zh) * 2020-08-05 2020-12-15 深圳市永达电子信息股份有限公司 一种视频流的人脸识别方法及识别系统
CN112188091B (zh) * 2020-09-24 2022-05-06 北京达佳互联信息技术有限公司 人脸信息识别方法、装置、电子设备及存储介质
CN112836682B (zh) * 2021-03-04 2024-05-28 广东建邦计算机软件股份有限公司 视频中对象的识别方法、装置、计算机设备和存储介质
CN113408348B (zh) * 2021-05-14 2022-08-19 桂林电子科技大学 一种基于视频的人脸识别方法、装置及存储介质
US11521447B1 (en) * 2021-08-16 2022-12-06 Mark Ellery Ogram Anti-shoplifting system
US11462065B1 (en) 2022-01-05 2022-10-04 Mark Ellery Ogram Security system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102752540A (zh) * 2011-12-30 2012-10-24 新奥特(北京)视频技术有限公司 一种基于人脸识别技术的自动编目方法
CN103530652A (zh) * 2013-10-23 2014-01-22 北京中视广信科技有限公司 一种基于人脸聚类的视频编目方法、检索方法及其系统
CN104036236A (zh) * 2014-05-27 2014-09-10 厦门瑞为信息技术有限公司 一种基于多参数指数加权的人脸性别识别方法
CN105631408A (zh) * 2015-12-21 2016-06-01 小米科技有限责任公司 基于视频的面孔相册处理方法和装置
CN108229322A (zh) * 2017-11-30 2018-06-29 北京市商汤科技开发有限公司 基于视频的人脸识别方法、装置、电子设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5699449A (en) * 1994-11-14 1997-12-16 The University Of Connecticut Method and apparatus for implementation of neural networks for face recognition
WO2000016243A1 (en) * 1998-09-10 2000-03-23 Mate - Media Access Technologies Ltd. Method of face indexing for efficient browsing and searching ofp eople in video
US7697026B2 (en) * 2004-03-16 2010-04-13 3Vr Security, Inc. Pipeline architecture for analyzing multiple video streams
US7796785B2 (en) * 2005-03-03 2010-09-14 Fujifilm Corporation Image extracting apparatus, image extracting method, and image extracting program
US8861804B1 (en) * 2012-06-15 2014-10-14 Shutterfly, Inc. Assisted photo-tagging with facial recognition models
CN104008370B (zh) * 2014-05-19 2017-06-13 清华大学 一种视频人脸识别方法
WO2016061780A1 (en) * 2014-10-23 2016-04-28 Intel Corporation Method and system of facial expression recognition using linear relationships within landmark subsets
US9430694B2 (en) * 2014-11-06 2016-08-30 TCL Research America Inc. Face recognition system and method
CN105868695B (zh) * 2016-03-24 2019-04-02 北京握奇数据系统有限公司 一种人脸识别方法及系统
CN106919917A (zh) * 2017-02-24 2017-07-04 北京中科神探科技有限公司 人脸比对方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102752540A (zh) * 2011-12-30 2012-10-24 新奥特(北京)视频技术有限公司 一种基于人脸识别技术的自动编目方法
CN103530652A (zh) * 2013-10-23 2014-01-22 北京中视广信科技有限公司 一种基于人脸聚类的视频编目方法、检索方法及其系统
CN104036236A (zh) * 2014-05-27 2014-09-10 厦门瑞为信息技术有限公司 一种基于多参数指数加权的人脸性别识别方法
CN105631408A (zh) * 2015-12-21 2016-06-01 小米科技有限责任公司 基于视频的面孔相册处理方法和装置
CN108229322A (zh) * 2017-11-30 2018-06-29 北京市商汤科技开发有限公司 基于视频的人脸识别方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
US11068697B2 (en) 2021-07-20
CN108229322A (zh) 2018-06-29
US20190318153A1 (en) 2019-10-17
CN108229322B (zh) 2021-02-12

Similar Documents

Publication Publication Date Title
WO2019105337A1 (zh) 基于视频的人脸识别方法、装置、设备、介质及程序
US10891465B2 (en) Methods and apparatuses for searching for target person, devices, and media
US11170210B2 (en) Gesture identification, control, and neural network training methods and apparatuses, and electronic devices
US11810374B2 (en) Training text recognition systems
US10586350B2 (en) Optimizations for dynamic object instance detection, segmentation, and structure mapping
CN108875522B (zh) 人脸聚类方法、装置和系统及存储介质
WO2019129032A1 (zh) 遥感图像识别方法、装置、存储介质以及电子设备
US10726208B2 (en) Consumer insights analysis using word embeddings
WO2019091464A1 (zh) 目标检测方法和装置、训练方法、电子设备和介质
WO2018157735A1 (zh) 目标跟踪方法、系统及电子设备
CN111798360B (zh) 一种水印检测方法、装置、电子设备及存储介质
KR20220006657A (ko) 깊이를 사용한 비디오 배경 제거
WO2019024808A1 (zh) 语义分割模型的训练方法和装置、电子设备、存储介质
US10380461B1 (en) Object recognition
US11164004B2 (en) Keyframe scheduling method and apparatus, electronic device, program and medium
US11481563B2 (en) Translating texts for videos based on video context
US11341735B2 (en) Image recommendation method, client, server, computer system and medium
WO2019214344A1 (zh) 系统增强学习方法和装置、电子设备、计算机存储介质
US20200143238A1 (en) Detecting Augmented-Reality Targets
WO2019100886A1 (zh) 用于确定目标对象的外接框的方法、装置、介质和设备
US20150131873A1 (en) Exemplar-based feature weighting
CN113643260A (zh) 用于检测图像质量的方法、装置、设备、介质和产品
CN113642481A (zh) 识别方法、训练方法、装置、电子设备以及存储介质
CN111552829A (zh) 用于分析图像素材的方法和装置
CN110089076B (zh) 实现信息互动的方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18883213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23/09/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18883213

Country of ref document: EP

Kind code of ref document: A1