CN112364829B

CN112364829B - Face recognition method, device, equipment and storage medium

Info

Publication number: CN112364829B
Application number: CN202011380912.8A
Authority: CN
Inventors: 文彬
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2023-03-24
Anticipated expiration: 2040-11-30
Also published as: CN112364829A

Abstract

The embodiment of the disclosure discloses a face recognition method, a face recognition device, face recognition equipment and a storage medium. The method comprises the following steps: determining a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame, wherein the candidate face label is a face label corresponding to a face feature in a face image library; determining a relevance score of the candidate face label and the target text; and determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score. According to the scheme, the face to be recognized in the current image frame is recognized in the image and text modes, and the accuracy of the face recognition result is improved.

Description

Face recognition method, device, equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, and in particular relates to a face recognition method, a face recognition device, face recognition equipment and a storage medium.

Background

With the rapid development of artificial intelligence and the exponential growth of short video data, more and more short video service scenes need the identification capability of fine-grained entities, especially the identification capability of specific users, so as to realize efficient video recommendation and retrieval.

The traditional face recognition method is realized based on face detection and feature matching, and the recognition method has high-efficiency performance in the field of traditional picture search, but the performance is generally poor in dynamic moving video scenes such as large resolution span, large illumination difference and the like.

Disclosure of Invention

The embodiment of the disclosure provides a face recognition method, a face recognition device, face recognition equipment and a storage medium, which can optimize the existing face recognition scheme.

In a first aspect, an embodiment of the present disclosure provides a face recognition method, including:

determining a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame, wherein the candidate face label is a face label corresponding to a face feature in a face image library;

determining a relevance score of the candidate face label and the target text;

and determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score.

In a second aspect, an embodiment of the present disclosure further provides a face recognition apparatus, including:

the system comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame, and the candidate face label is a face label corresponding to a face feature in a face image library;

a second determination module for determining a relevance score of the candidate face label and the target text;

and the third determining module is used for determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, implement a face recognition method as described in the first aspect.

In a fourth aspect, the disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the face recognition method according to the first aspect.

The disclosed embodiment provides a face recognition method, a device, equipment and a storage medium, wherein a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame are determined, and the candidate face label is a face label corresponding to a face feature in a face image library; determining a relevance score of the candidate face label and the target text; and determining a target candidate face label as the recognition result of the face to be recognized according to the first similarity score and the correlation score. According to the scheme, the face to be recognized in the current image frame is recognized in the image and text modes, and the accuracy of the face recognition result is improved.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale.

Fig. 1 is a flowchart of a face recognition method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a face recognition method according to a second embodiment of the present disclosure;

fig. 3 is a schematic diagram of an implementation process of face recognition according to a second embodiment of the present disclosure;

fig. 4 is a structural diagram of a face recognition apparatus according to a third embodiment of the present disclosure;

fig. 5 is a structural diagram of an electronic device according to a fourth embodiment of the disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the first, second, etc. concepts mentioned in the disclosure are only used for distinguishing different objects, and are not used for limiting the order or interdependence relationship of the objects.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Example one

Fig. 1 is a flowchart of a face recognition method according to an embodiment of the present disclosure, which is applicable to recognizing a face in a video or an image, and in particular, to recognizing a face in a video scene with a large resolution span, a large illumination difference, an excessive beauty, and a dynamic movement of a person. The method may be executed by a face recognition apparatus, the apparatus may be implemented in a software and/or hardware manner, and the apparatus may be configured in an electronic device, where the electronic device may be a terminal with an image data processing function, for example, a mobile terminal such as a mobile phone, a tablet, a notebook, and the like, or a fixed terminal such as a desktop, or a server. As shown in fig. 1, the method specifically includes the following steps:

s110, determining a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame.

And the candidate face label is a face label corresponding to the face features in the face image library. The current image frame may be an image frame obtained from a video or an image, and the embodiment takes the example of obtaining the current image frame from a video as an example. Optionally, one or more image frames in the video may be randomly acquired as the current image frame; one or more image frames in the video can also be acquired as the current image frame according to the time sequence, and when the current image frame comprises a plurality of image frames, the time intervals between adjacent image frames can be the same or different. The face to be recognized is a face to be recognized included in the current image frame, and the current image frame may include other faces or objects not to be recognized in addition to the face to be recognized.

The candidate face label is a face label similar to a face to be recognized in a face image library, the face image library is used for storing a corresponding relation between a reference face feature and the face label, the reference face feature is a feature of a face image corresponding to the face label, and the feature can be a global feature of the face image. Optionally, the face features of the face to be recognized may be extracted, the face features are compared with reference face features in a face image library to obtain similarity scores of the face features and the reference face features, and a candidate face label corresponding to the face to be recognized is determined according to the similarity scores. The similarity score can be represented by a numerical value between 0 and 1, and the higher the similarity score is, the higher the similarity degree between the candidate face label and the face to be recognized is. The first similarity score is the similarity score of the candidate face label and the face to be recognized and is used for representing the similarity degree of the candidate face label and the face to be recognized.

The target text may be a text corresponding to characters included in the current image frame, may also be a text corresponding to voice data of a certain video including the current image frame, and may also be a text obtained based on the characters included in the current image frame and the voice data of the certain video including the current image frame. Optionally, characters included in the current image frame may be recognized by an OCR (Optical Character Recognition) technology, so as to obtain a text corresponding to the characters; the voice data of a certain video including the current image frame is recognized through an ASR (Automatic Speech Recognition) technology, and a corresponding text is obtained. Of course, other text recognition techniques or voice recognition techniques may also be used, and the embodiment is not particularly limited.

And S120, determining the relevance scores of the candidate face labels and the target text.

The relevance score is used for representing the relevance degree of the candidate face label and the target text, and can be represented by a numerical value between 0 and 1, wherein 0 can represent that the candidate face label and the target text are not related completely, 1 can represent that the candidate face label and the target text are related completely, and the higher the relevance score is, the higher the relevance degree of the candidate face label and the target text is. Optionally, the candidate face label and the target text may be input into a pre-trained correlation model, and a correlation score between the candidate face label and the target text is output by the correlation model, where the correlation model may be a neural network model based on deep learning, and the embodiment does not limit a specific structure of the neural network model.

In order to improve the accuracy of the subsequent recognition result, the relevance model may perform fuzzy text matching when matching the candidate face label with the target text to obtain the relevance score, for example, when the target text includes three sheets, the candidate face label may be matched with one sheet, "" indicates an uncertain text, and "" may be "three", or may be a text other than "three".

S130, determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score.

The target candidate face label is a candidate face label of which the similarity degree with the face to be recognized in the candidate face labels meets set conditions. The target candidate face label is determined based on the candidate face label, the first similarity score of the face to be recognized and the correlation score of the candidate face label and the target text, multiple modalities such as images and texts are considered, and particularly in a dynamic video scene, the accuracy of a face recognition result can be improved. Optionally, the first similarity score and the correlation score may be linearly combined to obtain a composite score, and the candidate face label with the composite score larger than the set threshold is determined as the target candidate face label and returned to the user as the recognition result of the face to be recognized.

The method comprises the steps of determining a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame, wherein the candidate face label is a face label corresponding to face features in a face image library; determining a relevance score of the candidate face label and the target text; and determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score. According to the scheme, the face to be recognized in the current image frame is recognized in the image and text modes, and the accuracy of the face recognition result is improved.

Example two

Fig. 2 is a flowchart of a face recognition method provided in the second embodiment of the present disclosure, and this embodiment is optimized on the basis of the above embodiments, and referring to fig. 2, the method may include the following steps:

and S210, extracting the face features to be recognized of the face to be recognized.

The face features to be recognized may be global features of the face to be recognized. The embodiment does not limit the specific feature extraction method. For example, a face to be recognized may be input into the feature extraction model, and the feature extraction model outputs the global features of the face to be recognized. The face is identified through the global features, and the accuracy of the identification result can be improved.

And S220, determining second similarity scores of the human face features to be recognized and the reference human face features.

The reference face features are face features in the face image library, the face image library is used for storing the corresponding relation between the reference face features and face labels, each reference face feature can correspond to one face label, and the face image library can store a plurality of face labels and a plurality of reference face features. The second similarity score is the similarity score of the face feature to be recognized and each reference face feature in the face image library and is used for representing the similarity degree of the face feature to be recognized and each reference face feature. In one example, cosine values of included angles of the face feature to be recognized and the reference face features may be determined, and similarity scores of the face feature to be recognized and the reference face features may be determined according to the cosine values. In one example, the face feature to be recognized and the reference face feature may also be input into the similarity model, and the similarity score of the face feature to be recognized and the reference face feature may be output by the similarity model. Considering that the number of the reference face features in the face image library is large, in order to improve efficiency, the face features to be recognized and the batch reference face features can be input into the similarity model, and similarity scores between the face features to be recognized and the batch reference face features are output by the similarity model. The similarity model may adopt a neural network model, and the embodiment does not limit the specific structure of the neural network model.

S230, determining a candidate face label and a first similarity score of the candidate face label and the face to be recognized according to the title information of the current image frame and the second similarity score.

The title information of the current image frame may be the title information of the video corresponding to the current image frame, and optionally, the title information of the video may be recognized by an OCR technology as the title information of the current image frame. According to the embodiment, the candidate face label is determined based on the title information of the current image frame and the second similarity score, so that the accuracy of the candidate face label can be effectively improved. The first similarity score is the similarity score of the candidate face label and the face to be recognized and is used for representing the similarity degree of the candidate face label and the face to be recognized.

Optionally, the candidate face label and the first similarity score between the candidate face label and the face to be recognized may be determined as follows:

s2301, determining a first reference face feature of which the second similarity score between the reference face feature and the face to be recognized is larger than a first set threshold, and taking a first face label corresponding to the first reference face feature as a first candidate face label.

The first reference face feature is a face feature corresponding to the condition that the second similarity score in the reference face features is larger than a first set threshold value. It can be understood that a large number of reference face features and corresponding face labels are stored in the face image library, and in order to improve the recognition efficiency and the accuracy of the recognition result, the face labels in the face image library can be screened, and the reference face features meeting the preset conditions and the corresponding face labels are selected for the next operation. The reference face features meeting the preset condition may be the reference face features corresponding to the second similarity score which is greater than the first set threshold, and the embodiment records the part of the reference face features as the first reference face features. The first set threshold may be determined based on a preset number of preliminarily screened face tags, for example, if the preset number of preliminarily screened face tags is 50, the second similarity scores may be sorted, and the first set threshold is determined based on the sorting result and the number 50, where the number of reference face features in the face image library is greater than 50. The first face label is a face label corresponding to the first reference face feature, and the embodiment marks the first face label as a first candidate face label.

S2302, determining the matching degree of the first candidate face label and the title information.

The matching degree is used for representing the matching degree of the first candidate face label and the title information, and the first candidate face label can be further screened according to the matching degree to a certain extent, so that the identification range is narrowed. Optionally, the first candidate face label and the header information may be input into the matching degree model, and the matching degree of the first candidate face label and the header information is output by the matching degree model, which may also be in other manners as long as the matching degree of the first candidate face label and the header information can be determined.

S2303, determining a third similarity score of the first candidate face label and the face to be recognized according to the matching degree.

Optionally, if the first candidate face tag and the header information are successfully matched, a first bias may be added to the second similarity score corresponding to the first candidate face tag to improve the confidence of the first candidate face tag, and the embodiment does not limit the magnitude of the first bias, for example, the same first bias may be added to the second similarity score successfully matched, or the magnitude of the first bias may be determined according to the matching degree, for example, the higher the matching degree is, the larger the first bias is. Whether the first candidate face label and the title information are successfully matched or not can be determined according to the matching degree between the first candidate face label and the title information, for example, when the matching degree is greater than a set threshold value, the first candidate face label and the title information are considered to be successfully matched, otherwise, the first candidate face label and the title information are considered to be unsuccessfully matched, and if the first candidate face label and the title information are unsuccessfully matched, the second similarity score corresponding to the first candidate face label can be kept unchanged. And obtaining a third similarity score of the first candidate face label and the face to be recognized after matching is finished.

S2304, determining a candidate face label according to the third similarity score and the first candidate face label, and taking the third similarity score corresponding to the candidate face label as the first similarity score of the candidate face label and the face to be recognized.

The candidate face label is a face label in the first candidate face label, optionally, the first candidate face label may be further screened based on a third similarity score obtained after matching, for example, the first candidate face label corresponding to the third similarity score larger than a second set threshold may be used as a second candidate face image, and the size of the second set threshold may be set according to an actual situation. Then counting the occurrence times of the same face in the second candidate face label; and determining a second candidate face label with the occurrence frequency meeting a second preset condition as a candidate face label. The second preset condition may be that the occurrence frequency is greater than or equal to a set threshold, and the size of the set threshold may be set according to an actual situation, for example, the number of the screened second candidate face tags is 10, and the set threshold is 5, where the frequency of occurrence of thiram is 6, the frequency of occurrence of lie is 2, and the frequency of occurrence of wang five is 2, then the face tag corresponding to zhang three in the second candidate face tags may be used as the candidate face tag.

The first similarity score is the similarity score between the determined candidate face label and the face to be recognized, and is equal to the third similarity score between the second candidate face label corresponding to the candidate face label and the face to be recognized in value.

S240, identifying characters appearing in the current image frame to obtain a first text.

Optionally, an OCR technology may be used to recognize characters appearing in the current image frame to obtain the first text. Other character recognition techniques may also be used to recognize the characters appearing in the current image frame, and the embodiment is not limited in particular.

And S250, identifying voice data corresponding to the target video containing the current image frame to obtain a second text.

And voice data is considered in the process of recognizing the face, so that the accuracy of the recognition result can be improved. It can be understood that although the current image frame contains a face to be recognized, the current image frame does not necessarily contain voice data of the face to be recognized, and therefore the voice data of the target video containing the current image frame can be acquired as the voice data corresponding to the current image frame for assisting in determining the face recognition result. Optionally, a video with a set duration including the current image frame may be used as the target video, or the entire video including the current image frame may be used as the target video. The target video may include voice data corresponding to a face to be recognized. The second text is a text corresponding to the voice data of the target video, and optionally, the ASR technology may be used to recognize the voice data of the target video to obtain the second text. Other speech recognition techniques may also be used to recognize the speech data of the target video, and the embodiment is not limited in particular.

It should be noted that, in practical application, the order of S240 and S250 is not limited, that is, S240 and S250 may be executed first, S250 and S240 may be executed first, and S240 and S250 may be executed simultaneously. Figure 2 is only exemplary of one implementation.

S260, splicing the first text and the second text to obtain the target text.

The embodiment does not limit the splicing order of the first text and the second text, for example, the first text may be located before the second text or located after the second text.

And S270, determining the relevance scores of the candidate face labels and the target text.

S280, determining a first comprehensive score corresponding to the candidate face label according to the first similarity score and the correlation score.

Optionally, a corresponding weight coefficient may be set for the first similarity score according to the magnitude of the first similarity score, a corresponding weight coefficient may be set for the correlation score according to the magnitude of the correlation score, and then the first similarity score and the correlation score are linearly combined to obtain a first composite score. For example, the first similarity scores are a and B, the corresponding weight coefficients are a first weight coefficient and a second weight coefficient, the correlation scores are D and E, and the corresponding weight coefficients are a third weight coefficient and a fourth weight coefficient, respectively, then the first composite score may be represented as: s = a + B + D + E, where S is a first composite score, a is a first weight coefficient, B is a second weight coefficient, D is a third weight coefficient, and E is a fourth weight coefficient.

And S290, judging whether the first comprehensive score is larger than or equal to a third set threshold value, if so, executing S2100, otherwise, executing S2110.

Specifically, when the first integrated score is greater than or equal to the third set threshold, the face recognition result may be determined directly based on the first integrated score, and if the first integrated score is less than the third set threshold, the quality score of the face to be recognized needs to be further considered. The quality score is used for measuring the quality condition of the face to be recognized, for example, whether the face to be recognized is a positive face, whether the face to be recognized is clear, and the like, and the higher the quality score is, the clearer the face to be recognized is, the closer the face to be recognized is to the positive face, or the face to be recognized is. Compared with the traditional mode of screening the face image based on the quality score of the face to be recognized in the process of recognizing the face, the embodiment determines whether to further determine the quality score of the face to be recognized based on the size of the first comprehensive score after the first comprehensive score is determined, so that the situation that a part of images with poor quality scores are ignored can be avoided, and the accuracy of the face recognition result is improved.

S2100, taking the first comprehensive score as a comprehensive score corresponding to the candidate face label.

Specifically, when the first comprehensive score is greater than or equal to the third set threshold, the first comprehensive score can be directly used as the comprehensive score corresponding to the candidate face label, and the face recognition result is determined according to the comprehensive score without considering the quality of the face to be recognized.

And S2110, determining the quality score of the face to be recognized, and determining a comprehensive score corresponding to the candidate face label according to the quality score and the first comprehensive score.

And when the first comprehensive score is smaller than a third set threshold value, further determining the quality condition of the face to be recognized to obtain a quality score, and obtaining the comprehensive score corresponding to each candidate face label by combining the first comprehensive score on the basis. Optionally, if the quality score is smaller than a fourth set threshold, it indicates that the quality of the face to be recognized is poor, and the first comprehensive score may be ignored and directly used as the comprehensive score corresponding to the candidate face label; if the quality score is larger than the fifth set threshold, the first composite score and the quality score can be linearly combined to obtain a composite score. Optionally, a weight coefficient may be assigned to the quality score based on the quality score, and a product of the weight coefficient and the quality score is linearly superimposed on the basis of the first integrated score to obtain an integrated score. The magnitude of the fourth setting threshold and the magnitude of the fifth setting threshold may be set according to actual conditions, and the fourth setting threshold and the fifth setting threshold may be the same or different.

And S2120, sequentially arranging corresponding candidate face labels according to the comprehensive scores, and taking the candidate face labels with the comprehensive scores larger than a set threshold value as target candidate face labels.

Specifically, the candidate face labels may be sequentially arranged according to the composite score, and the ordering result may be output and displayed. The larger the comprehensive score is, the more similar the face label is to the face to be recognized, and the higher the confidence coefficient is. In the embodiment, the candidate face label with the comprehensive score larger than the set threshold is used as the recognition result of the face to be recognized.

And S2130, displaying the sequencing result of the target candidate face label.

Optionally, the target candidate face labels may be displayed in an order from large to small of the composite score, or may be displayed in an order from small to large of the composite score.

Exemplarily, referring to fig. 3, fig. 3 is a schematic diagram of an implementation process of face recognition according to a second embodiment of the present disclosure. For the video data a, analysis may be performed from multiple modalities, a specific face may be identified, for example, image frame and title information in a video may be extracted from a visual aspect, candidate face tags may be determined from a face image library based on the image frame and the title information, characters included in the image frame and voice data included in the video where the image frame is located may also be identified from a text aspect, corresponding texts may be obtained, a target candidate face tag may be determined based on the texts and the candidate face tag, and meanwhile, the target candidate face tag may also be ranked based on a composite score of the target candidate face tag, and a ranking result may be presented.

The second embodiment of the present disclosure provides a face recognition method, which recognizes a face from multiple modalities such as text, voice, and image on the basis of the first embodiment, and thus effectively improves the accuracy of a face recognition result.

EXAMPLE III

Fig. 4 is a structural diagram of a face recognition apparatus according to a third embodiment of the present disclosure, where the apparatus may execute the face recognition method according to the foregoing embodiment, and referring to fig. 4, the apparatus may include:

a first determining module 31, configured to determine a candidate face tag corresponding to a face to be recognized in a current image frame, a first similarity score between the candidate face tag and the face to be recognized, and a target text corresponding to the current image frame, where the candidate face tag is a face tag corresponding to a face feature in a face image library;

a second determining module 32, configured to determine a relevance score between the candidate face label and the target text;

a third determining module 33, configured to determine, according to the first similarity score and the correlation score, a target candidate face label as a recognition result of the face to be recognized.

The third embodiment of the present disclosure provides a face recognition apparatus, which determines a candidate face tag corresponding to a face to be recognized in a current image frame, a first similarity score between the candidate face tag and the face to be recognized, and a target text corresponding to the current image frame, where the candidate face tag is a face tag corresponding to a face feature in a face image library; determining a relevance score of the candidate face label and the target text; and determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score. According to the scheme, the face to be recognized in the current image frame is recognized in the image and text modes, and the accuracy of the face recognition result is improved.

On the basis of the above embodiment, the first determining module 31 includes:

the characteristic extraction unit is used for extracting the human face characteristics to be recognized of the human face to be recognized;

the second similarity score determining unit is used for determining second similarity scores of the human face features to be recognized and reference human face features, wherein the reference human face features are human face features in the human face image library, and the human face image library is used for storing the corresponding relation between the reference human face features and human face labels;

and the first similarity score determining unit is used for determining a candidate face label and a first similarity score of the candidate face label and the face to be recognized according to the title information of the current image frame and the second similarity score.

On the basis of the above embodiment, the first similarity score determining unit includes:

a first candidate face image determining subunit, configured to determine a first reference face feature in which a second similarity score between the reference face feature and the face to be recognized is greater than a first set threshold, and use a first face label corresponding to the first reference face feature as a first candidate face label;

a matching degree determining subunit, configured to determine a matching degree between the first candidate face tag and the header information;

a third similarity score determining subunit, configured to determine, according to the matching degree, a third similarity score between the first candidate face tag and the face to be recognized;

and the candidate face image determining subunit is configured to determine a candidate face label according to the third similarity score and the first candidate face label, and use the third similarity score corresponding to the candidate face label as the first similarity score between the candidate face label and the face to be recognized.

On the basis of the foregoing embodiment, the candidate face image determination subunit is specifically configured to:

taking the first candidate face label corresponding to the third similarity score larger than the second set threshold value as a second candidate face label;

counting the occurrence times of the same face in the second candidate face label;

and determining a second candidate face label with the occurrence frequency meeting a second preset condition as a candidate face label.

On the basis of the above embodiment, the determination process of the target text is as follows:

identifying characters appearing in the current image frame to obtain a first text;

recognizing voice data corresponding to a target video containing the current image frame to obtain a second text;

and splicing the first text and the second text to obtain the target text.

On the basis of the above embodiment, the third determining module 33 includes:

a first comprehensive score determining unit, configured to determine a first comprehensive score corresponding to the candidate face tag according to the first similarity score and the relevance score;

a comprehensive score determining unit, configured to, if the first comprehensive score is greater than or equal to a third set threshold, take the first comprehensive score as a comprehensive score corresponding to the candidate face tag; otherwise, determining the quality score of the face to be recognized, and determining a comprehensive score corresponding to the candidate face label according to the quality score and the first comprehensive score;

the target candidate face label determining unit is used for sequentially arranging corresponding candidate face labels according to the comprehensive scores and taking the candidate face labels with the comprehensive scores larger than a set threshold value as target candidate face labels;

and the display unit is used for displaying the sequencing result of the target candidate face label.

On the basis of the above embodiment, the comprehensive score determining unit is specifically configured to:

if the quality score is smaller than a fourth set threshold value, determining the first comprehensive score as a comprehensive score corresponding to the candidate face label;

and if the quality score is larger than a fifth set threshold value, linearly combining the first comprehensive score and the quality score to obtain a comprehensive score corresponding to the candidate face label.

On the basis of the above embodiment, the current image frame includes an image frame obtained from a video production end.

The face recognition device provided by the embodiment of the disclosure and the face recognition method provided by the embodiment belong to the same inventive concept, and technical details which are not described in detail in the embodiment can be referred to the embodiment, and the embodiment has the same beneficial effects as the face recognition method.

Example four

Referring now to FIG. 5, a block diagram of an electronic device 400 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and servers, and the like. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, electronic device 400 may include a processing device (e.g., central processing unit, graphics processor, etc.) 401 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 402 or a program loaded from a storage device 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data necessary for the operation of the electronic apparatus 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Generally, the following devices may be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 407 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 408 including, for example, tape, hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to communicate wirelessly or by wire with other devices to exchange data. While fig. 5 illustrates an electronic device 400 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 409, or from the storage device 408, or from the ROM 402. The computer program performs the above-described functions defined in the methods of the embodiments of the present disclosure when executed by the processing device 401.

EXAMPLE five

The computer readable medium described above in this disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: determining a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame, wherein the candidate face label is a face label corresponding to a face feature in a face image library; determining a relevance score of the candidate face label and the target text; and determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. The name of a module does not constitute a limitation on the module itself in some cases, for example, the first determining module may also be described as a "module that determines a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized, and a target text corresponding to the current image frame".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to one or more embodiments of the present disclosure, there is provided a face recognition method including:

determining a relevance score of the candidate face label and the target text;

According to one or more embodiments of the present disclosure, in a face recognition method provided by the present disclosure, the determining a candidate face label corresponding to a face to be recognized in a current image frame, the candidate face label, and a first similarity score of the face to be recognized includes:

extracting the human face features to be recognized of the human face to be recognized;

determining a second similarity score of the face features to be recognized and reference face features, wherein the reference face features are face features in the face image library, and the face image library is used for storing the corresponding relation between the reference face features and face labels;

and determining a candidate face label and a first similarity score of the candidate face label and the face to be recognized according to the title information of the current image frame and the second similarity score.

According to one or more embodiments of the present disclosure, in a face recognition method provided by the present disclosure, the determining a candidate face label and a first similarity score between the candidate face label and the face to be recognized according to the header information of the current image frame and the second similarity score includes:

determining a first reference face feature of which the second similarity score of the reference face feature and the face to be recognized is greater than a first set threshold, and taking a first face label corresponding to the first reference face feature as a first candidate face label;

determining the matching degree of the first candidate face label and the title information;

determining a third similarity score of the first candidate face label and the face to be recognized according to the matching degree;

and determining a candidate face label according to the third similarity score and the first candidate face label, and taking the third similarity score corresponding to the candidate face label as the first similarity score of the candidate face label and the face to be recognized.

According to one or more embodiments of the present disclosure, in the face recognition method provided by the present disclosure, the determining a candidate face label according to the third similarity score and the first candidate face label includes:

According to one or more embodiments of the present disclosure, in the face recognition method provided by the present disclosure, the determination process of the target text is as follows:

and splicing the first text and the second text to obtain the target text.

According to one or more embodiments of the present disclosure, in the face recognition method provided by the present disclosure, the determining a target candidate face label according to the first similarity score and the correlation score, as a recognition result of the face to be recognized, includes:

determining a first comprehensive score corresponding to the candidate face label according to the first similarity score and the correlation score;

if the first comprehensive score is larger than or equal to a third set threshold value, taking the first comprehensive score as a comprehensive score corresponding to the candidate face label; otherwise, determining the quality score of the face to be recognized, and determining a comprehensive score corresponding to the candidate face label according to the quality score and the first comprehensive score;

sequentially arranging corresponding candidate face labels according to the comprehensive scores, and taking the candidate face labels with the comprehensive scores larger than a set threshold value as target candidate face labels;

and displaying the sequencing result of the target candidate face label.

According to one or more embodiments of the present disclosure, in the face recognition method provided by the present disclosure, the determining a composite score corresponding to the candidate face label according to the quality score and the first composite score includes:

According to one or more embodiments of the present disclosure, in the face recognition method provided by the present disclosure, the current image frame includes an image frame obtained from a video production end.

According to one or more embodiments of the present disclosure, there is provided a face recognition apparatus including:

In accordance with one or more embodiments of the present disclosure, there is provided an electronic device including:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, implement any of the face recognition methods provided in the present disclosure.

According to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, the program, when executed by a processor, implementing any of the face recognition methods as provided by the present disclosure.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A face recognition method, comprising:

determining a candidate face label corresponding to a face to be recognized in a current image frame, a first similarity score of the candidate face label and the face to be recognized and a target text corresponding to the current image frame, wherein the candidate face label is a face label corresponding to face features in a face image library;

determining a relevance score of the candidate face label and the target text;

determining a target candidate face label as a recognition result of the face to be recognized according to the first similarity score and the correlation score;

the determining a candidate face label corresponding to a face to be recognized in a current image frame, the candidate face label, and a first similarity score of the face to be recognized includes:

determining a candidate face label and a first similarity score of the candidate face label and the face to be recognized according to the title information of the current image frame and the second similarity score;

the determination process of the target text is as follows:

and splicing the first text and the second text to obtain the target text.

2. The method of claim 1, wherein determining a candidate face label and a first similarity score between the candidate face label and the face to be recognized according to the header information of the current image frame and the second similarity score comprises:

3. The method of claim 2, wherein determining a candidate face label based on the third similarity score and the first candidate face label comprises:

4. The method according to claim 1, wherein the determining a target candidate face label as the recognition result of the face to be recognized according to the first similarity score and the correlation score comprises:

and displaying the sequencing result of the target candidate face label.

5. The method of claim 4, wherein determining a composite score corresponding to the candidate face label based on the quality score and the first composite score comprises:

6. The method of any of claims 1-5, wherein the current image frame comprises an image frame derived from a video obtained from a video production end.

7. A face recognition apparatus, comprising:

a third determining module, configured to determine, according to the first similarity score and the correlation score, a target candidate face label as a recognition result of the face to be recognized;

the first determining module includes:

a first similarity score determining unit, configured to determine, according to the header information of the current image frame and the second similarity score, a candidate face label and a first similarity score between the candidate face label and the face to be recognized;

the determination process of the target text is as follows:

and splicing the first text and the second text to obtain the target text.

8. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

the one or more programs, when executed by the one or more processors, implement the face recognition method of any of claims 1-6.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the face recognition method according to any one of claims 1 to 6.