CN110674483A - Identity recognition method based on multi-mode information - Google Patents

Identity recognition method based on multi-mode information Download PDF

Info

Publication number
CN110674483A
CN110674483A CN201910749103.0A CN201910749103A CN110674483A CN 110674483 A CN110674483 A CN 110674483A CN 201910749103 A CN201910749103 A CN 201910749103A CN 110674483 A CN110674483 A CN 110674483A
Authority
CN
China
Prior art keywords
model
score
data set
face
identity recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910749103.0A
Other languages
Chinese (zh)
Other versions
CN110674483B (en
Inventor
管贻生
叶家杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910749103.0A priority Critical patent/CN110674483B/en
Publication of CN110674483A publication Critical patent/CN110674483A/en
Application granted granted Critical
Publication of CN110674483B publication Critical patent/CN110674483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention discloses an identity recognition method based on multi-mode information, which comprises the following steps: firstly, making a multi-modal video data set with a label; step two, respectively constructing and training a human face detection model and a head detection model; step three, constructing and training a feature extraction model of the face, the head and the voice; step four, extracting the characteristics of the face, the head and the voice information through the trained characteristic extraction model; constructing and training a classification model to classify the three extracted features respectively; step six, respectively using three characteristics to predict results through a classification model; seventhly, performing information fusion on the classification result according to a formulated multi-mode information fusion strategy; step eight, sorting the fused results and outputting an identity recognition result; the invention provides an identity recognition network model based on multi-mode information, and the identity recognition network model has wide application prospects in the fields of human-computer interaction, information security, security monitoring and the like.

Description

Identity recognition method based on multi-mode information
Technical Field
The invention relates to the technical field of pattern recognition and biological recognition, in particular to an identity recognition method based on multi-mode information.
Background
With the economic development and experience accumulation, technological innovation has advanced greatly, and especially in recent decades, a series of new technologies represented by biometric identification technology have advanced rapidly, and among methods for identity recognition, a face recognition technology attracts most attention. The face recognition technology recognizes the target identity by collecting and analyzing facial features of a person, has the characteristics of easiness in sampling, convenience in background operation, no contact with a sampling object and the like, has obvious advantages in practical application compared with other recognition modes, plays a remarkable role in the fields of identity recognition and intelligent man-machine interaction, and radiates considerable influence to the fields of safety monitoring, multimedia entertainment and the like.
Due to the interest in deep learning in recent years, there has been a great increase in research on identification, and especially in research based on face recognition and speaker recognition, the performance on public data sets has surpassed the recognition capability of people. Meanwhile, based on continuous optimization of the identity recognition algorithms in single mode, researchers gradually change the research direction from a constrained environment to an unconstrained environment, so that the difficulty of identity recognition is greatly improved, the difficulty of improving the recognition capability of the identity recognition algorithms in the unconstrained environment is also a difficult problem of the current research, and in many unconstrained environments, the identity recognition task cannot be completed by single mode information alone, and multiple mode information is required to be considered as a basis to improve the recognition capability. Therefore, identity recognition methods based on multimodal information are an important research direction.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide an identity recognition method based on multi-modal information.
The purpose of the invention is realized by the following technical scheme:
an identity recognition method based on multi-modal information comprises the following steps:
collecting video clips and video clips of a movie star and a known person, making a person data set containing multi-modal information, and adding an identity tag to the data set;
step two, constructing detection models of the face and the head, respectively training by using different opening source data sets, and detecting the face and the head in the character data set in the step one;
step three, constructing a feature extraction model of the face, head and sound modal information according to the face and head information detected in the step two, and training the model by using an open source data set;
step four, respectively extracting the characteristics of the human face, the head and the voice information according to the characteristic extraction model in the step three;
constructing a classification model, and training the classification model by using the training set and the verification set in the character data set in the step one;
step six, using the classification model in the step five to respectively predict the results of the test set in the human data set in the step one;
seventhly, performing information fusion on the prediction result by formulating a fusion strategy according to the prediction result in the sixth step;
and step eight, sorting and sequencing according to the fusion result in the step seven, and outputting a final identity recognition result.
Preferably, the specific process of creating a person data set containing information of multiple modalities and adding an identity tag to the data set in the first step is as follows:
constructing and training a face detection score evaluation and quality evaluation model, carrying out face detection score evaluation and quality score evaluation on a large number of acquired videos, wherein the detection score range is 0-1, the quality score range is 0-200, screening the videos through the face detection score and quality evaluation model, randomly dividing the videos into video segments of 3-30 seconds, dividing 80% of video data of the whole data set into high-score video segments and 20% of low-score video segments, and adding 5% of unknown label video segments in the data set.
Preferably, a face detection model is constructed in the second step, a detection model is constructed according to a Pyramidbox algorithm, and the detection model is trained by using an open source data set Megaface and MS-Celeb-1M; the head detection model is YOLOv3, and only the head position of a person is detected by using the weight trained by the open source.
Preferably, the feature extraction model of the human face in the third step is a neural network feature extraction model based on a VGG16 structure and an ArcFace loss function, and an open source data set Megaface and an MS-Celeb-1M training model are used; wherein the ArcFace loss function is shown in the following formula (1):
Figure BDA0002166607620000031
in the above formula, N represents the batch size of the input data, s represents a hypersphere with radius s, m represents an additional angle edge penalty value,angle, theta, representing true valuejRepresenting the included angle between the jth column weight and the ith sample characteristic;
the feature extraction models of the human face and the head have the same neural network structure and the same loss function, but the network parameters are not shared;
the feature extraction model of the sound is a neural network model based on Resnet50, the last layer loss function is softmax, and the model is trained by using an open source data set VoxColeb 2.
Preferably, feature extraction is performed on the character data set in the first step by using a feature extraction model of the face, the head and the voice in the fourth step, and the output of the last but one full-connected layer is taken as the feature to be extracted, wherein the last but one full-connected layer has 512 nodes.
Preferably, the classification model in the fifth step is a multilayer perceptron and has three fully-connected layers, the first and second layers are 1024 nodes, the third layer is a classified number, the classification model is trained only by using three types of modal information extracted from the training set and the verification set, and the three types of modal information are respectively trained.
Preferably, in the sixth step, the classification model is used to predict the results of the test set of the human data set, and the prediction results are three types and are predicted by the human face classification model, the head classification model and the sound classification model respectively.
Preferably, the fusion strategy in the seventh step is a method for information fusion on a decision layer, and a weighted average method is used for obtaining a fusion result, wherein the selection of the weight is divided into two parts, under the condition that the face detection score and the quality score are high, the detection score and the quality score of the face are selected as the weight, and under other conditions, the prediction result ranking score is adopted as the weight;
specifically, the selection of the weight is divided into two parts according to the detection score and the quality score of the face, the high-score video is subjected to prediction classification through the first part, and the low-score video is subjected to prediction classification through the second part;
the fusion strategy of the first part mainly uses the detection fraction and the mass fraction as weights to calculate a weighted average value, which is shown in the following formula (2):
Figure BDA0002166607620000051
in the above formula, qua scoreiRepresents the quality score, detscore, of the ith frame imageiDenotes a detection score of an i-th frame image, n denotes the number of frames contained in a currently input video, fiRepresenting the characteristics of the ith frame in the current video, and F representing the composite characteristic expression obtained by weighted average;
the fusion strategy of the second part mainly utilizes three prediction results to carry out decision fusion, video IDs with the same prediction results are accumulated according to different labels, and weighted average values are obtained through ranking scores, as shown in the following notations (3) and (4):
Figure BDA0002166607620000052
Figure BDA0002166607620000053
in the above formulaLabel i denotes the ith label, result scorejRepresents the jth prediction, rank scorejThe number of the j-th predicted results is represented by m, the number of the predicted results under the same label and the same video ID in all the predicted results is represented by W, the weight score of the same label and the same video ID is represented by N, the number of classification categories in the data set is represented by k, and the number of the video IDs contained under the same label is represented by k.
Preferably, in the step eight, the fusion results in the step seven are used, the fusion results are respectively sorted according to the label types in the data set, the top K method is used for sorting and selecting, and the identity recognition result is output according to the sorting result.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention provides a method for making a multi-mode information data set, which solves the technical problem of screening out multi-mode information data meeting requirements under the condition of a large amount of data;
(2) the invention provides an effective multi-mode information fusion model, which solves the problem that identity recognition cannot be carried out through single-mode information under a real unconstrained environment, for example, face recognition cannot be carried out accurately under the conditions that an image has exposure and a side face and a face is shielded;
(3) the invention provides a method for fusing various prediction results based on a weighted mean value, and a method for hierarchically sampling a data set of K-fold is combined, so that the prediction results are enhanced, the result prediction accuracy is improved, and the problem that the prediction accuracy is easy to reduce when result fusion is carried out at a decision level is solved.
Drawings
FIG. 1 is a schematic flow diagram of the present invention;
FIG. 2 is a schematic flow chart of the multi-modal information dataset production of the present invention;
FIG. 3 is a schematic flow chart of the multi-modal feature extraction of the present invention;
FIG. 4 is a schematic diagram of a model structure of the fusion policy model according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1 to 4, an identity recognition method based on multi-modal information includes the following steps:
firstly, making a character video data set with a label and multi-modal information, wherein the multi-modal information comprises a face, a head, sound and the like;
as shown in fig. 2, a face detection score evaluation and quality evaluation model is constructed and trained, a large number of videos obtained from a network are subjected to face detection score evaluation and quality score evaluation, the detection score range is 0-1, the quality score range is 0-200, the videos are screened through the face detection score evaluation and quality evaluation model, the videos are randomly cut into video segments of 3-30 seconds, the detection score is greater than 0.8, the quality score is greater than 80, the video segments are high-score video segments, the other video segments are low-score video segments, 80% of video data in the whole data set are high-score video segments, 20% of the video segments are low-score video segments, and 5% of unknown label video segments are added in the data set.
Step two, respectively constructing and training a human face detection model and a head detection model, wherein the neural network structures of the human face detection model and the head detection model are different, the human face detection model is trained by using an open source data set, and the head detection model uses pre-training weights of the open source;
(1) and constructing a face detection model, constructing the detection model according to a Pyramidbox algorithm, and training the detection model by using an open source data set Megaface and MS-Celeb-1M.
(2) The detection model for constructing the head is YOLOv3, and only the head position of the person is detected by using the weight trained in the open source pre-training.
Step three, constructing and training feature extraction models of the face, the head and the voice, wherein the face and head feature extraction models adopt a neural network feature extraction model of a VGG16 structure and an ArcFace loss function, and an open source data set Megaface and an MS-Celeb-1M training model are used; the voice extraction model is based on a Resnet50 neural network model, the number of nodes in the second layer from the last is 512, the loss function in the last layer is softmax, and an open source data set VoxColeb 2 is used for training the model; the ArcFace loss function is shown in the following equation (1):
Figure BDA0002166607620000071
in the above formula, N represents the batch size of the input data, s represents a hypersphere with radius s, m represents an additional angle edge penalty value,
Figure BDA0002166607620000081
angle, theta, representing true valuejRepresenting the angle between the jth column weight and the ith sample feature.
And step four, extracting the features of the human face, the head and the sound information through the trained feature extraction model, extracting the human face, the head and the sound features in the character data set in the step one by using the human face and head detection model in the step two and the three feature extraction models in the step three, and taking the output of the second last layer of each feature extraction model as the extracted features, wherein the specific feature extraction process is shown in fig. 3.
Constructing and training classification models to classify the three extracted features respectively, wherein the classification models are all of multilayer perceptron structures, each multilayer perceptron has three layers of neural network structures, namely three full-connection layers, the first layer and the second layer are 1024 nodes, the third layer is the number of classified classes, namely the number of nodes of the last output layer is the number of classes of data set classification, and the loss function of the last layer is a softmax function; and training the classification model by using only three kinds of modal information extracted from the training set and the verification set, wherein the three kinds of modal information are used for respectively training the three classification models.
Step six, respectively using three characteristics to predict results through a classification model, and specifically comprising the following steps: and performing hierarchical sampling on the character data set by using a K-fold method, splitting the character data set into K data sets, and performing result prediction on the K personal face detection model data sets by using three models respectively to obtain 3 times K prediction results.
Seventhly, performing information fusion on the classification result according to a formulated multi-mode information fusion strategy, wherein the specific steps are shown in FIG. 4;
the fusion strategy is mainly divided into two parts according to the detection score and the quality score of the human face, the high-score videos are subjected to prediction classification through the first part, and the low-score videos are subjected to prediction classification through the second part.
The fusion strategy of the first part mainly uses the detection fraction and the mass fraction as weights to calculate a weighted average value, which is shown in the following formula (2):
Figure BDA0002166607620000091
in the above formula, qua scoreiRepresents the quality score, detscore, of the ith frame imageiDenotes a detection score of an i-th frame image, n denotes the number of frames contained in a currently input video, fiRepresenting the feature of the ith frame in the current video, and F representing the composite feature expression obtained by weighted averaging.
The fusion strategy of the second part mainly utilizes three prediction results to make decision fusion, accumulates video IDs with the same prediction results according to different labels, and calculates a weighted average value through ranking scores, as shown in the following formulas (3) and (4):
Figure BDA0002166607620000092
in the above formula, label i represents the ith label, result scorejRepresents the jth prediction, rank scorejThe number of the j-th predicted results is represented by m, the number of the predicted results under the same label and the same video ID in all the predicted results is represented by W, the weight score of the same label and the same video ID is represented by N, the number of classification categories in the data set is represented by k, and the number of the video IDs contained under the same label is represented by k.
And step eight, sorting according to the weight scores under each label by using the fusion result obtained in the step seven, and finally outputting an identity recognition result according to a Top K method.
The invention provides a method for making a multi-mode information data set, which solves the technical problem of screening out multi-mode information data meeting requirements under the condition of a large amount of data; an effective multi-mode information fusion model is provided, and the problem that identity recognition cannot be performed through single-mode information in a real unconstrained environment is solved, for example, face recognition cannot be performed accurately under the conditions that an image has exposure and a side face and a face is shielded; the method for fusing multiple prediction results based on the weighted mean is provided, and the K-fold data set hierarchical sampling method is combined, so that the prediction results are enhanced, the result prediction accuracy is improved, and the problem that the prediction accuracy is easy to reduce when result fusion is carried out at a decision level is solved.
The present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents and are included in the scope of the present invention.

Claims (9)

1. An identity recognition method based on multi-modal information, characterized by comprising the following steps:
collecting video clips and video clips of a movie star and a known person, making a person data set containing multi-modal information, and adding an identity tag to the data set;
step two, constructing detection models of the face and the head, respectively training by using different opening source data sets, and detecting the face and the head in the character data set in the step one;
step three, constructing a feature extraction model of the face, head and sound modal information according to the face and head information detected in the step two, and training the model by using an open source data set;
step four, respectively extracting the characteristics of the human face, the head and the voice information according to the characteristic extraction model in the step three;
constructing a classification model, and training the classification model by using the training set and the verification set in the character data set in the step one;
step six, using the classification model in the step five to respectively predict the results of the test set in the human data set in the step one;
seventhly, performing information fusion on the prediction result by formulating a fusion strategy according to the prediction result in the sixth step;
and step eight, sorting and sequencing according to the fusion result in the step seven, and outputting a final identity recognition result.
2. The multi-modal information-based identity recognition method according to claim 1, wherein the specific process of creating a people data set containing multi-modal information and adding identity tags to the data set in the first step is as follows:
constructing and training a face detection score evaluation and quality evaluation model, carrying out face detection score evaluation and quality score evaluation on a large number of acquired videos, wherein the detection score range is 0-1, the quality score range is 0-200, screening the videos through the face detection score and quality evaluation model, randomly dividing the videos into video segments of 3-30 seconds, dividing 80% of video data of the whole data set into high-score video segments and 20% of low-score video segments, and adding 5% of unknown label video segments in the data set.
3. The identity recognition method based on multi-modal information according to claim 1, wherein a face detection model is constructed in the second step, a detection model is constructed according to a Pyramidbox algorithm, and the detection model is trained by using an open source data set Megaface and MS-Celeb-1M; the head detection model is YOLOv3, and only the head position of a person is detected by using the weight trained by the open source.
4. The identity recognition method based on multi-modal information according to claim 1, wherein the feature extraction model of the human face in the third step is a neural network feature extraction model based on a VGG16 structure and an ArcFace loss function, and the model is trained by using a source dataset Megaface and an MS-Celeb-1M; wherein the ArcFace loss function is shown in the following formula (1):
Figure FDA0002166607610000021
in the above formula, N represents the batch size of the input data, s represents a hypersphere with radius s, m represents an additional angle edge penalty value,
Figure FDA0002166607610000022
angle, theta, representing true valuejRepresenting the included angle between the jth column weight and the ith sample characteristic;
the feature extraction models of the human face and the head have the same neural network structure and the same loss function, but the network parameters are not shared;
the feature extraction model of the sound is a neural network model based on Resnet50, the last layer loss function is softmax, and the model is trained by using an open source data set VoxColeb 2.
5. The method of claim 1, wherein the character data set of the first step is extracted by using the feature extraction model of the face, head and voice of the fourth step, and the output of the last-but-one layer of the fully-connected layer is taken as the feature to be extracted, wherein the last-but-one layer comprises 512 nodes.
6. The identity recognition method based on multi-modal information according to claim 1, wherein the classification model in the fifth step is a multi-layered perceptron, and has three fully connected layers, the first and second layers are 1024 nodes, the third layer is a classification number, the classification model is trained by using only three types of modal information extracted from the training set and the verification set, and the three types of modal information are used for respectively training the three classification models.
7. The identity recognition method based on multi-modal information as claimed in claim 1, wherein in the sixth step, the classification model is used to perform result prediction on the test set of the human data set, and the prediction results are three types and are predicted by the face, head and sound classification models respectively.
8. The identity recognition method based on multi-modal information according to claim 1, wherein the fusion strategy in the seventh step is a method of information fusion on a decision layer, a weighted average method is used to obtain a fusion result, wherein the selection of the weight is divided into two parts, the detection score and the quality score of the face are selected as the weight under the condition that the face detection score and the quality score are high, and the ranking score of the prediction result is adopted as the weight under other conditions;
specifically, the selection of the weight is divided into two parts according to the detection score and the quality score of the face, the high-score video is subjected to prediction classification through the first part, and the low-score video is subjected to prediction classification through the second part;
the fusion strategy of the first part mainly uses the detection fraction and the mass fraction as weights to calculate a weighted average value, which is shown in the following formula (2):
Figure FDA0002166607610000041
in the above formula, qua scoreiRepresents the quality score, detscore, of the ith frame imageiDenotes a detection score of an i-th frame image, n denotes the number of frames contained in a currently input video, fiRepresenting the characteristics of the ith frame in the current video, and F representing the composite characteristic expression obtained by weighted average;
the fusion strategy of the second part mainly utilizes three prediction results to carry out decision fusion, video IDs with the same prediction results are accumulated according to different labels, and weighted average values are obtained through ranking scores, as shown in the following notations (3) and (4):
Labeli∶
Figure FDA0002166607610000042
Figure FDA0002166607610000043
in the above formula, labeli represents the ith label, result scorejRepresents the jth prediction, rank scorejThe number of the j-th predicted results is represented by m, the number of the predicted results under the same label and the same video ID in all the predicted results is represented by W, the weight score of the same label and the same video ID is represented by N, the number of classification categories in the data set is represented by k, and the number of the video IDs contained under the same label is represented by k.
9. The multi-modal information based identity recognition method according to claim 1, wherein the fusion result of the step seven is used in the step eight, the fusion results are respectively sorted according to the label categories in the data set, the top K method is used for sorting and selecting, and the identity recognition result is output according to the sorting result.
CN201910749103.0A 2019-08-14 2019-08-14 Identity recognition method based on multi-mode information Active CN110674483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910749103.0A CN110674483B (en) 2019-08-14 2019-08-14 Identity recognition method based on multi-mode information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910749103.0A CN110674483B (en) 2019-08-14 2019-08-14 Identity recognition method based on multi-mode information

Publications (2)

Publication Number Publication Date
CN110674483A true CN110674483A (en) 2020-01-10
CN110674483B CN110674483B (en) 2022-05-13

Family

ID=69068584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910749103.0A Active CN110674483B (en) 2019-08-14 2019-08-14 Identity recognition method based on multi-mode information

Country Status (1)

Country Link
CN (1) CN110674483B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507311A (en) * 2020-05-22 2020-08-07 南京大学 Video character recognition method based on multi-mode feature fusion depth network
CN111862990A (en) * 2020-07-21 2020-10-30 苏州思必驰信息科技有限公司 Speaker identity verification method and system
CN112215257A (en) * 2020-09-14 2021-01-12 德清阿尔法创新研究院 Multi-person multi-modal perception data automatic marking and mutual learning method
CN112818175A (en) * 2021-02-07 2021-05-18 中国矿业大学 Factory worker searching method and training method of worker recognition model
CN112989967A (en) * 2021-02-25 2021-06-18 复旦大学 Personnel identity identification method based on audio and video information fusion
WO2021204086A1 (en) * 2020-04-06 2021-10-14 华为技术有限公司 Identity authentication method, and method and device for training identity authentication model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
US20030110038A1 (en) * 2001-10-16 2003-06-12 Rajeev Sharma Multi-modal gender classification using support vector machines (SVMs)
US20170039357A1 (en) * 2015-08-03 2017-02-09 Samsung Electronics Co., Ltd. Multi-modal fusion method for user authentication and user authentication method
CN108648746A (en) * 2018-05-15 2018-10-12 南京航空航天大学 A kind of open field video natural language description generation method based on multi-modal Fusion Features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
US20030110038A1 (en) * 2001-10-16 2003-06-12 Rajeev Sharma Multi-modal gender classification using support vector machines (SVMs)
US20170039357A1 (en) * 2015-08-03 2017-02-09 Samsung Electronics Co., Ltd. Multi-modal fusion method for user authentication and user authentication method
CN108648746A (en) * 2018-05-15 2018-10-12 南京航空航天大学 A kind of open field video natural language description generation method based on multi-modal Fusion Features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Y. LIU ET AL: ""iqiyi-vid: A large dataset for multi-modal person identification"", 《ARXIV PREPRINT ARXIV:1811.07548》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021204086A1 (en) * 2020-04-06 2021-10-14 华为技术有限公司 Identity authentication method, and method and device for training identity authentication model
CN111507311A (en) * 2020-05-22 2020-08-07 南京大学 Video character recognition method based on multi-mode feature fusion depth network
CN111507311B (en) * 2020-05-22 2024-02-20 南京大学 Video character recognition method based on multi-mode feature fusion depth network
CN111862990A (en) * 2020-07-21 2020-10-30 苏州思必驰信息科技有限公司 Speaker identity verification method and system
CN112215257A (en) * 2020-09-14 2021-01-12 德清阿尔法创新研究院 Multi-person multi-modal perception data automatic marking and mutual learning method
CN112818175A (en) * 2021-02-07 2021-05-18 中国矿业大学 Factory worker searching method and training method of worker recognition model
CN112818175B (en) * 2021-02-07 2023-09-01 中国矿业大学 Factory staff searching method and training method of staff identification model
CN112989967A (en) * 2021-02-25 2021-06-18 复旦大学 Personnel identity identification method based on audio and video information fusion

Also Published As

Publication number Publication date
CN110674483B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN110674483B (en) Identity recognition method based on multi-mode information
CN107330362B (en) Video classification method based on space-time attention
CN109376242B (en) Text classification method based on cyclic neural network variant and convolutional neural network
CN110084151B (en) Video abnormal behavior discrimination method based on non-local network deep learning
CN111275085A (en) Online short video multi-modal emotion recognition method based on attention fusion
CN109949317A (en) Based on the semi-supervised image instance dividing method for gradually fighting study
CN111144448A (en) Video barrage emotion analysis method based on multi-scale attention convolutional coding network
CN109190479A (en) A kind of video sequence expression recognition method based on interacting depth study
CN106650806A (en) Cooperative type deep network model method for pedestrian detection
CN108681712A (en) A kind of Basketball Match Context event recognition methods of fusion domain knowledge and multistage depth characteristic
CN108171184A (en) Method for distinguishing is known based on Siamese networks again for pedestrian
CN105787458A (en) Infrared behavior identification method based on adaptive fusion of artificial design feature and depth learning feature
CN111581385A (en) Chinese text type identification system and method for unbalanced data sampling
Jing et al. Yarn-dyed fabric defect classification based on convolutional neural network
CN110348416A (en) Multi-task face recognition method based on multi-scale feature fusion convolutional neural network
CN108427740B (en) Image emotion classification and retrieval algorithm based on depth metric learning
CN113536922A (en) Video behavior identification method for weighting fusion of multiple image tasks
Ocquaye et al. Dual exclusive attentive transfer for unsupervised deep convolutional domain adaptation in speech emotion recognition
Islam et al. A review on video classification with methods, findings, performance, challenges, limitations and future work
CN108256307A (en) A kind of mixing enhancing intelligent cognition method of intelligent business Sojourn house car
CN105930792A (en) Human action classification method based on video local feature dictionary
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN113255602A (en) Dynamic gesture recognition method based on multi-modal data
CN112329438A (en) Automatic lie detection method and system based on domain confrontation training
CN112732921A (en) False user comment detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant