CN116597810A - Identity recognition method, identity recognition device, computer equipment and storage medium - Google Patents

Identity recognition method, identity recognition device, computer equipment and storage medium Download PDF

Info

Publication number
CN116597810A
CN116597810A CN202310714968.XA CN202310714968A CN116597810A CN 116597810 A CN116597810 A CN 116597810A CN 202310714968 A CN202310714968 A CN 202310714968A CN 116597810 A CN116597810 A CN 116597810A
Authority
CN
China
Prior art keywords
user
features
voice
comparison result
voiceprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310714968.XA
Other languages
Chinese (zh)
Inventor
贺亚运
王健宗
彭俊清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310714968.XA priority Critical patent/CN116597810A/en
Publication of CN116597810A publication Critical patent/CN116597810A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/005Language recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The present application relates to the field of voiceprint recognition and digital medical treatment, and in particular, to an identity recognition method, apparatus, computer device, and storage medium. The method comprises the following steps: acquiring video data of a user to be identified; extracting voiceprint characteristics of the video data through the voiceprint recognition model to obtain voiceprint characteristics of a user to be recognized; extracting accent features of the video data through an accent recognition model to obtain accent features of a user to be recognized; comparing the joint characteristics obtained by splicing the voiceprint characteristics and the accent characteristics with all sample voice characteristics in a registration database to generate a voice comparison result; and determining identity information corresponding to the user to be identified according to the voice comparison result. The application carries out the identity recognition through the joint characteristics of the voice data of the user to be recognized, considers the tone information of the speaker and the accent information of the speaker, ensures that the voice characteristics of the obtained speaker are more complete and accurate, and can improve the accuracy of the identity recognition.

Description

Identity recognition method, identity recognition device, computer equipment and storage medium
Technical Field
The present application relates to the field of voiceprint recognition and digital medical treatment, and in particular, to an identity recognition method, apparatus, computer device, and storage medium.
Background
With the rapid development of artificial intelligence technology, scenes requiring identification are becoming more common. For example, in a digital medical related scenario, such as a appointment registration or a remote consultation, identification needs to be performed. In the existing identification technology, identification is generally performed by image recognition. Specifically, the identification is performed by acquiring a face image or a fingerprint image of the user. Identity recognition technology based on user images has been widely used, and good recognition effects are obtained.
However, in some scenarios, the effect of identification by image recognition is still not ideal. For example, in a rainy day, due to lower definition of the acquired face image, erroneous judgment is easy to exist, resulting in lower recognition accuracy. In addition, in some scenes, identification by image recognition is not possible because an image cannot be acquired. Therefore, there is a need for an identification method that is distinguished from image recognition.
Disclosure of Invention
Based on this, it is necessary to provide an identification method, an identification device, a computer device and a storage medium for solving the problem that the existing identification technology has low identification accuracy or cannot be identified.
An identity recognition method, comprising:
acquiring video data of a user to be identified;
extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized;
extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized;
splicing the voiceprint features and the accent features to obtain joint features;
comparing the combined features with all sample voice features in a registration database to generate a voice comparison result;
and determining identity information corresponding to the user to be identified according to the voice comparison result.
An identity recognition device, comprising:
the video data module is used for acquiring video data of the user to be identified;
the voiceprint feature module is used for extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized;
the accent feature module is used for extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized;
the joint feature module is used for performing splicing processing on the voiceprint feature and the accent feature to obtain a joint feature;
the voice comparison result module is used for comparing the joint characteristics with all sample voice characteristics in the registration database to generate a voice comparison result;
and the identity information module is used for determining the identity information corresponding to the user to be identified according to the voice comparison result.
A computer device comprising a memory, a processor and computer readable instructions stored in the memory and executable on the processor, the processor implementing the identification method described above when executing the computer readable instructions.
One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the identification method as described above.
The identity recognition method, the identity recognition device, the computer equipment and the storage medium are used for acquiring video data of a user to be recognized; extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized; extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized; splicing the voiceprint features and the accent features to obtain joint features; comparing the combined features with all sample voice features in a registration database to generate a voice comparison result; and determining identity information corresponding to the user to be identified according to the voice comparison result. The application carries out the identity recognition through the joint characteristics of the voice data of the user to be recognized, considers the tone information of the speaker and the accent information of the speaker, ensures that the voice characteristics of the obtained speaker are more complete and accurate, and can improve the accuracy of the identity recognition. The identification method can be applied to the scenes of appointment registration, remote consultation and the like of digital medical treatment, and the accuracy of identification in the digital medical scene can be improved through the identification method, so that the appointment registration or consultation efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of an application environment of an identification method according to an embodiment of the present application;
FIG. 2 is a flow chart of an identification method according to an embodiment of the application;
FIG. 3 is a schematic diagram of an identification device according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a computer device in accordance with an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The identity recognition method provided by the embodiment can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server. Clients include, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers.
In an embodiment, as shown in fig. 2, an identification method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
s10, acquiring video data of the user to be identified.
Video data is understood to mean data comprising an audio stream and a video stream of a user to be identified. The video data may be acquired by an image pickup apparatus. The user to be identified is a user needing identity identification.
S20, voiceprint feature extraction is carried out on the video data through a voiceprint recognition model, and voiceprint features of the user to be recognized are obtained.
Understandably, the voiceprint recognition model is a trained deep-learning neural network model. For example, the voiceprint recognition model may be a neural network model such as CNN (Convolutional Neural Networks, convolutional neural network), RNN (Recurrent Neural Networks, recurrent neural network), or the like. The voiceprint feature extraction refers to a process of encoding audio information in video data through a voiceprint recognition model and extracting voiceprint features of a user to be recognized. The voiceprint features refer to the features of tone information of a user to be identified.
S30, extracting the accent features of the video data through an accent recognition model to obtain the accent features of the user to be recognized.
Understandably, the accent recognition model is a trained deep learning neural network model. For example, the accent recognition model may be a neural network model such as CNN (Convolutional Neural Networks, convolutional neural network), RNN (Recurrent Neural Networks, recurrent neural network), or the like. The accent feature extraction refers to a process of encoding audio information in video data through an accent recognition model and extracting accent features of a user to be recognized. The accent features refer to features of accent information of a user to be identified.
Preferably, in the process of training the accent recognition model, the voices of people in the country using the local accent mandarin are used as training data sets. That is, model training is performed using the local mandarin chinese across the country as the training data set of the initial accent recognition model to learn accent information of speakers across the country, so that the accent recognition model after training has the ability to extract accent features of the user to be recognized and the ability to perform accent recognition. The initial accent recognition model is an untrained deep learning neural network model.
And S40, performing splicing processing on the voiceprint feature and the accent feature to obtain a combined feature.
Understandably, after the voiceprint feature and the accent feature of the user to be identified are obtained, the voiceprint feature and the accent feature are spliced to obtain a joint feature containing both tone information of the user to be identified and accent information of the user to be identified.
S50, comparing the combined features with all sample voice features in a registration database to generate a voice comparison result.
A registration database is understood to mean a database for storing user registration information. The user registration information refers to personal related information used for registration by the user, and includes, but is not limited to, identity information, face information, fingerprint information and voice information. For example, when a user registers user information in a bank transaction, video capturing is generally required to be performed on the user to acquire face information and voice information of the user to store as user registration information. The identity information of the user is stored in one-to-one correspondence with the face information, the fingerprint information and the voice information. The sample speech feature refers to a speech feature of sample speech information stored in the registration database, which is extracted from the sample speech information stored in the registration database. The sample voice information is voice information when the user registers. The voice comparison result refers to the result of comparing the joint characteristics with all the sample voice characteristics in the registration database. The speech comparison result may include a plurality of speech similarity values of the joint feature and all of the sample speech features, a maximum value of the plurality of speech similarity values, and a sample speech feature, a speech ranking number of the plurality of speech similarity values, etc. corresponding to the maximum value. The speech similarity value refers to the similarity between the joint feature and the sample speech feature. The voice sequence number is a sequence number obtained by sequencing a plurality of voice similarity values according to the sequence from the big value to the small value.
S60, according to the voice comparison result, determining identity information corresponding to the user to be identified.
Understandably, according to the voice comparison result, the identity information corresponding to the user to be identified can be rapidly determined. Specifically, a voice similarity value corresponding to a preset voice arrangement sequence number is obtained from a voice comparison result, and sample voice information corresponding to the voice similarity value is determined as target voice information. And further, according to the target voice information, acquiring identity information corresponding to the target voice information as identity information corresponding to the user to be identified.
Preferably, the number of preset phonetic arrangement sequence numbers may be one or more. When the number of the preset voice arrangement serial numbers is a plurality of, for example, 3, the voice similarity value of the first three of the preset voice arrangement serial numbers is obtained, three sample voice information corresponding to the three voice similarity values is determined to be target voice information, and three identity information corresponding to the three sample voice information is obtained and used as the identity information of the user to be identified to be output, so that the user to be identified can confirm the three identity information, and the identification accuracy is increased.
In steps S10-S60, video data of a user to be identified is acquired; extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized; extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized; splicing the voiceprint features and the accent features to obtain joint features; comparing the combined features with all sample voice features in a registration database to generate a voice comparison result; and determining identity information corresponding to the user to be identified according to the voice comparison result. In this embodiment, identity recognition is performed through the joint features of the voice data of the user to be recognized, so that not only tone information of the speaker but also accent information of the speaker are considered, the obtained voice features of the speaker are more complete and accurate, and the accuracy of the identity recognition can be improved. The identification method can be applied to the scenes of appointment registration, remote consultation and the like of digital medical treatment, and the accuracy of identification in the digital medical scene can be improved through the identification method, so that the appointment registration or consultation efficiency is improved.
Optionally, in step S20, that is, the voiceprint feature extraction is performed on the video data through the voiceprint recognition model, to obtain the voiceprint feature of the user to be recognized, including:
s201, dividing the video data through a video dividing technology to obtain a video stream and an audio stream;
s202, carrying out voiceprint encoding on the audio stream through a voiceprint encoder of the voiceprint recognition model to obtain voiceprint characteristics of the user to be recognized.
The video division technique is understood to mean a technique of dividing a video stream and an audio stream contained in video data. A video stream refers to data comprising a number of video frames. An audio stream refers to data comprising several audio frames. The voiceprint encoder is used for carrying out voiceprint encoding on the audio frames in the audio stream to obtain voiceprint characteristics.
In steps S201 and S202, the audio stream of the user to be identified is segmented from the video data by the video segmentation technique, and further, the voiceprint feature is extracted, so that the accuracy of voiceprint feature extraction can be improved.
Optionally, after step S60, that is, after obtaining the identity information corresponding to the user to be identified according to the voice comparison result, the method includes:
s601, extracting face features of the video stream through a face recognition model to obtain the face features of the user to be recognized;
s602, comparing the face features with all sample face features in the registration database to generate a face comparison result.
S603, determining identity information corresponding to the user to be identified according to the voice comparison result and the face comparison result.
Understandably, the face recognition model is a trained deep learning neural network model. The face recognition model is used for recognizing face images in the video stream and extracting face features. The face feature extraction refers to a process of encoding a face image in video data through a face model and extracting face features of a user to be identified. The face features refer to features of faces of users to be identified. The sample face feature refers to a face feature of sample face information stored in the registration database, which is extracted from the sample face information stored in the registration database. The sample face information is face information when the user registers. The face comparison result is a result of comparing the face features of the user to be identified with all the sample face features in the registration database. The face comparison result may include a plurality of face similarity values of the face features of the user to be identified and all the sample face features, a maximum value of the plurality of face similarity values, the sample face feature corresponding to the maximum value, a face arrangement sequence number of the plurality of face similarity values, and the like. The face similarity value refers to the similarity between the face features of the user to be identified and the sample face features. The face arrangement sequence number is a sequence number obtained by sequencing a plurality of face similarity values according to the sequence from the big value to the small value.
In steps S601-S601, face recognition is performed on the video stream in the video data, and a face comparison result is obtained. Furthermore, the identity information of the user to be identified is determined according to the face comparison result and the voice comparison result, so that not only the voice characteristics of the user to be identified but also the face characteristics of the user to be identified are considered, and the accuracy of the identity identification is further improved.
Optionally, in step S603, the determining, according to the voice comparison result and the face comparison result, identity information corresponding to the user to be identified includes:
s6031, acquiring first identity information corresponding to the target voice characteristics according to the voice comparison result; acquiring second identity information corresponding to the target face features according to the face comparison result;
s6032, judging whether the first identity information and the second identity information are the same information or not to obtain a judging result;
and S6033, determining identity information corresponding to the user to be identified according to the judging result.
The target speech feature is understandably a sample speech feature corresponding to the joint feature of the user to be identified. Specifically, a voice similarity value corresponding to a preset voice arrangement sequence number is obtained from a voice comparison result, and a sample voice feature corresponding to the voice similarity value is determined as a target voice feature. The preset voice arrangement sequence number is a preset voice arrangement sequence number. The first identity information is identity information corresponding to a target voice feature, and specifically, the identity information corresponding to the target voice feature is obtained from a registration database according to the target voice feature. The target face features are sample face features corresponding to face features of the user to be identified. The sample face features refer to face features of sample face information stored in a registration database, and the face features are extracted from the sample face information stored in the registration database. The sample face information is face information when the user registers. Specifically, a face similarity value corresponding to a preset face arrangement sequence number is obtained from a face comparison result, and a sample face feature corresponding to the face similarity value is determined as a target face feature. The preset face arrangement sequence number is a preset face arrangement sequence number. The second identity information is identity information corresponding to the target face feature, and specifically, the identity information corresponding to the target face feature is obtained from a registration database according to the target face feature. The judging result comprises two kinds of information, namely the first identity information and the second identity information are the same information and the first identity information and the second identity information are different information. And when the first identity information and the second identity information are the same information, determining the first identity information or the second identity information as the identity information of the user to be identified.
In this embodiment, the identity information of the user to be identified is doubly confirmed by the first identity information determined by the voice comparison result and the second identity information determined by the face comparison result, so that the accuracy of identity identification can be improved.
Optionally, in step S6033, the determining, according to the determination result, identity information corresponding to the user to be identified includes:
s60331, if the judging result indicates that the first identity information and the second identity information are different information, acquiring fingerprint information of the user to be identified;
s60332, extracting fingerprint characteristics of the fingerprint information through a fingerprint identification model to obtain the fingerprint characteristics of the user to be identified;
s60333, comparing the fingerprint features with all sample fingerprint features in the registration database to generate a fingerprint comparison result.
S60334, determining identity information corresponding to the user to be identified according to the fingerprint comparison result, the voice comparison result and the face comparison result.
Understandably, when the first identity information and the second identity information are different information, the identity information of the user to be identified still has uncertainty, and the identity information of the user to be identified needs to be further identified so as to ensure the accuracy of identity identification. The fingerprint information refers to information about the fingerprint of the user to be identified. The fingerprint comparison result generation principle is the same as the voice comparison result generation principle and will not be described here. And carrying out multiple confirmation on the identity information of the user to be identified according to the third identity information determined by the fingerprint comparison result, the first identity information determined by the voice comparison result and the second identity information determined by the face comparison result.
Preferably, when the first identity information determined by the voice comparison result and the third identity information determined by the fingerprint comparison result are the same information, the first identity information or the third identity information is determined as the identity information of the user to be identified. And when the second identity information determined by the face comparison result and the third identity information determined by the fingerprint comparison result are the same information, determining the second identity information or the third identity information as the identity information of the user to be identified.
In this embodiment, multiple confirmations are performed on the identity information of the user to be identified through the third identity information determined by the voice fingerprint comparison result, the first identity information determined by the voice comparison result and the second identity information determined by the face comparison result, so that the accuracy of identity identification can be improved.
Optionally, in step S50, the comparing the joint feature with all the sample speech features in the registration database to generate a speech comparison result includes:
s501, calculating the similarity between the joint features and each sample voice feature in a registration database through a similarity model to obtain a plurality of voice similarity values;
s502, acquiring a sample voice feature corresponding to the maximum value in all the voice similarity values as the target voice feature;
s503, generating the voice comparison result according to the maximum value and the target voice characteristic.
Understandably, a similarity model is used to calculate the similarity between the joint feature and each sample speech feature in the enrollment database. The speech similarity value refers to the similarity between the joint feature and the sample speech feature. The maximum value in all the voice similarity values refers to the voice similarity value with the largest value in all the voice similarity values. The speech comparison result comprises a maximum value in a plurality of speech similarity values and a sample speech feature corresponding to the maximum value. That is, the speech comparison result includes the maximum of several speech similarity values, as well as the target speech feature.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
In an embodiment, an identification device is provided, where the identification device corresponds to the identification method in the foregoing embodiment one by one. As shown in fig. 3, the identification device includes a video data module 10, a voiceprint feature module 20, an accent feature module 30, a joint feature module 40, a voice comparison result module 50, and an identification information module 60. The functional modules are described in detail as follows:
a video data module 10, configured to obtain video data of a user to be identified;
the voiceprint feature module 20 is configured to extract voiceprint features of the video data through a voiceprint recognition model, so as to obtain voiceprint features of the user to be recognized;
the accent feature module 30 is configured to extract accent features of the video data through an accent recognition model, so as to obtain accent features of the user to be recognized;
a joint feature module 40, configured to splice the voiceprint feature and the accent feature to obtain a joint feature;
the voice comparison result module 50 is configured to compare the joint feature with all the sample voice features in the registration database to generate a voice comparison result;
and the identity information module 60 is configured to determine identity information corresponding to the user to be identified according to the voice comparison result.
The voiceprint feature module 20 includes:
the video data segmentation unit is used for carrying out segmentation processing on the video data through a video segmentation technology to obtain a video stream and an audio stream;
and the voiceprint feature unit is used for carrying out voiceprint encoding on the audio stream through a voiceprint encoder of the voiceprint recognition model to obtain voiceprint features of the user to be recognized.
Optionally, the identification device further includes:
the face feature module is used for extracting face features of the video stream through a face recognition model to obtain the face features of the user to be recognized;
and the face comparison result module is used for comparing the face characteristics with all sample face characteristics in the registration database to generate a face comparison result.
The identity information module 60 is further configured to determine identity information corresponding to the user to be identified according to the voice comparison result and the face comparison result.
Optionally, the identity information module 60 includes:
the identity information acquisition unit is used for acquiring first identity information corresponding to the target voice characteristics according to the voice comparison result; acquiring second identity information corresponding to the target face features according to the face comparison result;
the judging result unit is used for judging whether the first identity information and the second identity information are the same information or not to obtain a judging result;
and the identity information unit is used for determining the identity information corresponding to the user to be identified according to the judging result.
Optionally, the identity information unit includes:
the fingerprint information unit is used for acquiring the fingerprint information of the user to be identified if the judging result indicates that the first identity information and the second identity information are different information;
the fingerprint feature unit is used for extracting fingerprint features of the fingerprint information through a fingerprint identification model to obtain fingerprint features of the user to be identified;
and the fingerprint comparison result unit is used for comparing the fingerprint characteristics with all sample fingerprint characteristics in the registration database to generate a fingerprint comparison result.
And the identity information confirmation unit is used for determining the identity information corresponding to the user to be identified according to the fingerprint comparison result, the voice comparison result and the face comparison result.
Optionally, the voice comparison result module 50 includes:
the voice similarity value unit is used for calculating the similarity between the joint features and each sample voice feature in the registration database through a similarity model to obtain a plurality of voice similarity values;
a target voice feature unit, configured to obtain a sample voice feature corresponding to a maximum value of all the voice similarity values as the target voice feature;
and the voice comparison result unit is used for generating the voice comparison result according to the maximum value and the target voice characteristic.
For specific limitations of the identification device, reference may be made to the above limitations of the identification method, and no further description is given here. The modules in the identification device can be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a readable storage medium, an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the execution of an operating system and computer-readable instructions in a readable storage medium. The database of the computer device is used for storing data related to the identification method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer readable instructions when executed by a processor implement a method of identity identification. The readable storage medium provided by the present embodiment includes a nonvolatile readable storage medium and a volatile readable storage medium.
In one embodiment, a computer device is provided that includes a memory, a processor, and computer readable instructions stored on the memory and executable on the processor, when executing the computer readable instructions, performing the steps of:
acquiring video data of a user to be identified;
extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized;
extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized;
splicing the voiceprint features and the accent features to obtain joint features;
comparing the combined features with all sample voice features in a registration database to generate a voice comparison result;
and determining identity information corresponding to the user to be identified according to the voice comparison result.
In one embodiment, one or more computer-readable storage media are provided having computer-readable instructions stored thereon, the readable storage media provided by the present embodiment including non-volatile readable storage media and volatile readable storage media. The readable storage medium has stored thereon computer readable instructions which when executed by one or more processors perform the steps of:
acquiring video data of a user to be identified;
extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized;
extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized;
splicing the voiceprint features and the accent features to obtain joint features;
comparing the combined features with all sample voice features in a registration database to generate a voice comparison result;
and determining identity information corresponding to the user to be identified according to the voice comparison result.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by instructing the associated hardware by computer readable instructions stored on a non-volatile readable storage medium or a volatile readable storage medium, which when executed may comprise the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. An identification method, comprising:
acquiring video data of a user to be identified;
extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized;
extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized;
splicing the voiceprint features and the accent features to obtain joint features;
comparing the combined features with all sample voice features in a registration database to generate a voice comparison result;
and determining identity information corresponding to the user to be identified according to the voice comparison result.
2. The method for identifying an identity according to claim 1, wherein the voice print feature extraction is performed on the video data through a voice print identification model to obtain voice print features of the user to be identified, and the method comprises:
dividing the video data by a video dividing technology to obtain a video stream and an audio stream;
and carrying out voiceprint encoding on the audio stream through a voiceprint encoder of the voiceprint recognition model to obtain voiceprint characteristics of the user to be recognized.
3. The method for identifying an identity according to claim 2, wherein after determining the identity information corresponding to the user to be identified according to the voice comparison result, the method comprises:
extracting face features of the video stream through a face recognition model to obtain the face features of the user to be recognized;
and comparing the face features with all the sample face features in the registration database to generate a face comparison result.
And determining identity information corresponding to the user to be identified according to the voice comparison result and the face comparison result.
4. The method for identifying a user according to claim 3, wherein the determining the identity information corresponding to the user to be identified according to the voice comparison result and the face comparison result includes:
acquiring first identity information corresponding to the target voice characteristics according to the voice comparison result; acquiring second identity information corresponding to the target face features according to the face comparison result;
judging whether the first identity information and the second identity information are the same information or not to obtain a judging result;
and determining identity information corresponding to the user to be identified according to the judging result.
5. The method for identifying an identity according to claim 4, wherein determining identity information corresponding to the user to be identified according to the determination result comprises:
if the judging result indicates that the first identity information and the second identity information are different information, acquiring fingerprint information of the user to be identified;
extracting fingerprint characteristics of the fingerprint information through a fingerprint identification model to obtain the fingerprint characteristics of the user to be identified;
and comparing the fingerprint characteristics with all sample fingerprint characteristics in the registration database to generate a fingerprint comparison result.
And determining identity information corresponding to the user to be identified according to the fingerprint comparison result, the voice comparison result and the face comparison result.
6. The method of claim 4, wherein comparing the combined features with all sample speech features in a registration database to generate a speech comparison result comprises:
calculating the similarity between the joint features and each sample voice feature in the registration database through a similarity model to obtain a plurality of voice similarity values;
acquiring a sample voice characteristic corresponding to the maximum value in all the voice similarity values as the target voice characteristic;
and generating the voice comparison result according to the maximum value and the target voice characteristic.
7. An identification device, comprising:
the video data module is used for acquiring video data of the user to be identified;
the voiceprint feature module is used for extracting voiceprint features of the video data through a voiceprint recognition model to obtain voiceprint features of the user to be recognized;
the accent feature module is used for extracting accent features of the video data through an accent recognition model to obtain accent features of the user to be recognized;
the joint feature module is used for performing splicing processing on the voiceprint feature and the accent feature to obtain a joint feature;
the voice comparison result module is used for comparing the joint characteristics with all sample voice characteristics in the registration database to generate a voice comparison result;
and the identity information module is used for determining the identity information corresponding to the user to be identified according to the voice comparison result.
8. The identification device of claim 7, wherein the voiceprint feature module comprises:
the video data segmentation unit is used for carrying out segmentation processing on the video data through a video segmentation technology to obtain a video stream and an audio stream;
and the voiceprint feature unit is used for carrying out voiceprint encoding on the audio stream through a voiceprint encoder of the voiceprint recognition model to obtain voiceprint features of the user to be recognized.
9. A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor, when executing the computer readable instructions, implements the identification method of any one of claims 1 to 6.
10. One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the identification method of any of claims 1-6.
CN202310714968.XA 2023-06-15 2023-06-15 Identity recognition method, identity recognition device, computer equipment and storage medium Pending CN116597810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310714968.XA CN116597810A (en) 2023-06-15 2023-06-15 Identity recognition method, identity recognition device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310714968.XA CN116597810A (en) 2023-06-15 2023-06-15 Identity recognition method, identity recognition device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116597810A true CN116597810A (en) 2023-08-15

Family

ID=87593808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310714968.XA Pending CN116597810A (en) 2023-06-15 2023-06-15 Identity recognition method, identity recognition device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116597810A (en)

Similar Documents

Publication Publication Date Title
EP3477519B1 (en) Identity authentication method, terminal device, and computer-readable storage medium
US10796685B2 (en) Method and device for image recognition
CN110444198B (en) Retrieval method, retrieval device, computer equipment and storage medium
EP3617946B1 (en) Context acquisition method and device based on voice interaction
WO2020244153A1 (en) Conference voice data processing method and apparatus, computer device and storage medium
CN108920640B (en) Context obtaining method and device based on voice interaction
CN111883140B (en) Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition
CN111739539B (en) Method, device and storage medium for determining number of speakers
CN108492830B (en) Voiceprint recognition method and device, computer equipment and storage medium
CN114245203B (en) Video editing method, device, equipment and medium based on script
CN111832581B (en) Lung feature recognition method and device, computer equipment and storage medium
WO2024000867A1 (en) Emotion recognition method and apparatus, device, and storage medium
WO2021027029A1 (en) Data processing method and device, computer apparatus, and storage medium
WO2020077874A1 (en) Method and apparatus for processing question-and-answer data, computer device, and storage medium
CN112434556A (en) Pet nose print recognition method and device, computer equipment and storage medium
CN111223476B (en) Method and device for extracting voice feature vector, computer equipment and storage medium
CN111898735A (en) Distillation learning method, distillation learning device, computer equipment and storage medium
JP7412496B2 (en) Living body (liveness) detection verification method, living body detection verification system, recording medium, and training method for living body detection verification system
CN113873088B (en) Interactive method and device for voice call, computer equipment and storage medium
US11611554B2 (en) System and method for assessing authenticity of a communication
CN113627387A (en) Parallel identity authentication method, device, equipment and medium based on face recognition
CN111353493B (en) Text image direction correction method and device
CN116665278A (en) Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium
CN110956133A (en) Training method of single character text normalization model, text recognition method and device
CN116484224A (en) Training method, device, medium and equipment for multi-mode pre-training model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination