CN110556114B - Speaker identification method and device based on attention mechanism - Google Patents
Speaker identification method and device based on attention mechanism Download PDFInfo
- Publication number
- CN110556114B CN110556114B CN201910684343.7A CN201910684343A CN110556114B CN 110556114 B CN110556114 B CN 110556114B CN 201910684343 A CN201910684343 A CN 201910684343A CN 110556114 B CN110556114 B CN 110556114B
- Authority
- CN
- China
- Prior art keywords
- tested
- call
- voice
- caller
- speaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000007246 mechanism Effects 0.000 title claims abstract description 28
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000012360 testing method Methods 0.000 claims abstract description 36
- 238000013528 artificial neural network Methods 0.000 claims abstract description 33
- 230000008569 process Effects 0.000 claims abstract description 31
- 239000013598 vector Substances 0.000 claims description 19
- 238000012545 processing Methods 0.000 claims description 14
- 230000007613 environmental effect Effects 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 4
- 230000006854 communication Effects 0.000 abstract description 21
- 238000004891 communication Methods 0.000 abstract description 15
- 230000006870 function Effects 0.000 description 20
- 230000009467 reduction Effects 0.000 description 7
- 230000000306 recurrent effect Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000010408 sweeping Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/64—Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
- H04M1/65—Recording arrangements for recording a message from the calling party
- H04M1/656—Recording arrangements for recording a message from the calling party for recording conversations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72448—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
- H04M1/72454—User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Environmental & Geological Engineering (AREA)
- Telephone Function (AREA)
Abstract
The invention discloses a speaker identification method and a speaker identification device based on an attention mechanism, which comprise the following steps: collecting the call records of a plurality of tested callers and the call records of the tested callers; establishing a speaker voice library according to the call record corresponding to the tested speaker; training the voice of the tested talking person by adopting a neural network based on attention to obtain a training model; storing the call record of the test caller to obtain a record file; and identifying whether the tested speaker is the target speaker or not by using the recording file by adopting the training model. The voice of the tested caller is trained by adopting the attention-based neural network to obtain a training model, the tested caller is identified by adopting the training model, the consistency of the master corresponding to the dialing number is confirmed, the potential communication safety hazard caused by the imitation of the identity of the caller is avoided, and the information safety in the calling process is further improved.
Description
Technical Field
The invention relates to the field of voice recognition, in particular to a speaker recognition method and device based on an attention mechanism.
Background
The voice is the most direct and the most main communication mode in human production and life, and the human voice contains semantic information, language or dialect information, channel information and the like. With the continuous progress and development of computer technology and the coming of the network era, more and more measures are provided to disguise the identity of the speaker.
Determining the identity of the speaker during the communication is advantageous for determining the security of the communication. The speaker recognition methods in the prior art include using an approximate value of KL divergence degrees as a measure of similarity between speakers, using a BP neural network to recognize speakers, using a mixed feature of MFCC and GFCC to recognize speakers, and the like. However, most application scenarios of the technology are based on smart home, such as a sweeping robot, which recognizes whether a speaker is the owner of the speaker according to voice. In the prior art, a speaker recognition method in the communication process is not provided, and the silence process of voice in the communication process is not considered.
Most of voice recognition methods in the prior art are suitable for a scene of recognizing voice signals directly acquired from the outside, a voice recognition method for a speaker in a communication process is lacked, and potential safety hazards that the identity of the speaker can be counterfeited exist.
Disclosure of Invention
The invention aims to provide a speaker identification method and device based on an attention mechanism to solve the technical problems.
In order to achieve the purpose, the invention provides the following scheme:
in a first aspect of the embodiments of the present invention, there is provided a method for identifying a speaker based on an attention mechanism, including the following steps:
collecting the call records of a plurality of tested callers and the call records of the tested callers;
establishing a speaker voice library according to the call record corresponding to the tested speaker;
training the voice of the tested talking person by adopting an attention-based neural network to obtain a training model;
storing the call record of the test caller to obtain a record file;
and identifying whether the tested speaker is the target speaker or not by using the recording file through a training model.
Optionally, the step of collecting the call records of a plurality of tested callers and the call records of the tested callers comprises:
the testing party records the voice of the tested caller in the call process by using the built-in recording function of the smart phone in the call process; the call recording function of the system is used in the call process, and the model of the mobile phone equipment, the storage format of the recording file and the external environment characteristics during the call need to be determined; and saving the call record in a lossless file form in Wave format.
Optionally, the step of establishing a speaker voice library according to the call record corresponding to the tested speaker includes:
acquiring the corresponding relation between the identity of the tested caller and the tested voice;
and establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the testing party, the environmental characteristics of the testing party, the communication time, the communication volume.
Optionally, the step of training the voice of the tested caller by using an attention-based neural network to obtain a training model includes:
denoising the sound recording file by adopting a wiener filter to obtain a preprocessed sound recording file;
and training the preprocessed sound recording file by adopting a time recurrent neural network based on an attention mechanism to obtain a training model.
Optionally, the step of training the preprocessed audio file by using a time recurrent neural network based on an attention mechanism to obtain a training model includes:
extracting voice features of the preprocessed audio file through an input layer of a time recursive neural network to obtain Mel cepstrum coefficient feature vectors of voices in the preprocessed audio file;
sending the Mel cepstrum coefficient feature vector to a full connection layer, and performing feature extraction on the Mel cepstrum coefficient feature vector by the full connection layer (which can be regarded as a self-encoder) to obtain a second feature vector of the voice in the preprocessed audio file;
sending the second feature vector to an attention-based time-recursive neural network layer, wherein the attention-based time-recursive neural network layer comprises a plurality of LSTM layers, and processing the second feature vector through the plurality of LSTM layers to obtain processed data;
and sending the processed data to a normalization index function layer, and correspondingly converting the processed data and the name of the person by the normalization index function layer to obtain the name of the person corresponding to the processed data.
Optionally, the step of identifying whether the tested speaker is the target speaker by using the training model for the sound recording file includes:
judging whether the voice to be tested exists in the speaker voice library, and if so, identifying the tested person existing in the speaker voice library; otherwise, if the audio file to be recognized belongs to the new tester, the existing tester is recognized as the closest existing tester, namely the existing classification with the maximum confidence level value.
In order to achieve the above object, the present invention further provides the following solutions:
a speaker recognition apparatus based on an attention mechanism, comprising:
the collection module is used for collecting the call records of a plurality of tested callers and the call records of the tested callers;
the voice database establishing module is used for establishing a voice database of the caller according to the call record corresponding to the tested caller;
the training module is used for training the voice of the tested talking person by adopting a neural network based on attention to obtain a training model;
the file storage module is used for storing the call record of the test caller to obtain a record file;
and the test module is used for identifying whether the tested speaker is the target speaker or not by using the recording file through the training model.
Optionally, the collecting module specifically includes:
the testing party unit is used for recording the voice of the tested caller in the calling process by using the built-in recording function of the smart phone in the calling process by the testing party;
the recording unit is used for using the call recording function of the system in the call process and determining the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics during the call; and saving the call record in a lossless file form in Wave format.
Optionally, the voice library establishing module specifically includes:
the corresponding relation obtaining unit is used for obtaining the corresponding relation between the identity of the tested caller and the tested voice;
and the voice library establishing unit is used for establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the tested party, the environmental characteristics of the tested party, the communication time, the communication volume.
Optionally, the training module specifically includes:
the preprocessing unit is used for denoising the sound recording file by adopting a wiener filter to obtain a preprocessed sound recording file;
and the training model establishing unit is used for training the preprocessed sound recording file by adopting a time recurrent neural network based on an attention mechanism to obtain a training model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a speaker recognition method and a device based on an attention mechanism, which train the voice of a tested speaker by adopting a neural network based on attention to obtain a training model, recognize the tested speaker by adopting the training model, and can determine the speaker in a communication process by utilizing an existing communication voice library under the condition of only audio frequency so that a user can display the speaker to be matched with an actual speaker through a number, thereby judging the reliability, effectively defending acts such as fraud and the like by imitating the voice of the speaker, and indirectly protecting the communication safety of the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a method for identifying a caller based on an attention mechanism according to an embodiment 1 of the present invention;
FIG. 2 is a flowchart of a speaker identification method based on an attention mechanism according to embodiment 2 of the present invention;
fig. 3 is a schematic structural diagram of a device for identifying a caller based on an attention mechanism according to embodiment 3 of the present invention;
fig. 4 is a diagram of a neural network structure provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
Example 1
Embodiment 1 of the present invention provides an embodiment of a speaker recognition method based on an attention mechanism, and as shown in fig. 1, the method includes the following steps:
s101: collecting the call records of a plurality of tested callers and the call records of the tested callers;
s102: establishing a caller voice library according to the call record corresponding to the tested caller;
s103: training the voice of the tested talking person by adopting an attention-based neural network to obtain a training model;
s104: storing the call record of the test caller to obtain a record file;
s105: and identifying whether the tested speaker is the target speaker or not by using the recording file through a training model.
In the answering process, the testing party records the call, and then the call record corresponds to the tested caller to construct a voice library of the tested caller;
learning the voice characteristics of the speaker by using an LSTM neural network based on attention to generate a speaker recognition model;
when the caller is identified, the testing party records the call process and stores the recording file into a wave audio file;
and (3) the stored audio file passes through a wiener filter, the Mel cepstrum coefficient is extracted, and then the extracted audio file is input into a trained speaker recognition model to recognize the tested speaker.
Wherein the use of an attention-based temporal recurrent neural network can be directly implemented with the open source tool tensorflow.
The network parameters used in the embodiment of the invention are as follows: rnn _ size is any real number, such as 64,
attn _ length is any real number, such as 64. It is emphasized that the core of the present invention is the speaker identification method based on attention mechanism, and the operations of modifying network parameters and the like for the network are all included in the present invention.
The speaker recognition based on the attention mechanism provided by the embodiment of the invention can determine the speaking in the communication process by utilizing the existing communication voice library under the condition of only audio frequency, so that the user can conveniently display the matching between the speaker and the actual speaker through the number, thereby judging the reliability and indirectly protecting the communication safety of the user.
Example 2
Embodiment 2 of the present invention provides a preferred embodiment of a speaker recognition method based on the attention mechanism. Referring to fig. 2, in this embodiment, the method includes the steps of:
s201: and collecting the call records of the users, and corresponding the call records to the tested caller to construct a caller voice library.
The testing party uses the built-in recording equipment of the smart phone in the call process or uses the earphone with the recording function to record the voice of the tested call person in the call process.
The android mobile phone can use the call recording function of the system in the call process, and the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics (such as quiet and noisy) in the call process need to be determined; and the apple mobile phone does not provide the conversation recording function because the privacy setting system, can record through the earphone that has the recording function, needs make clear the brand of earphone, the model of earphone, recording file save format, external environment characteristic (such as quiet, noisy).
When the testing party is the calling party, the built-in recording equipment of the smart phone or an earphone with a recording function can be opened after the called party answers; when the testing party is a called party, the built-in recording equipment of the smart phone or the earphone with the recording function can be opened when the calling party answers the calling.
The voice library is a correlation library of the identity of the tested speaker and the tested voice, and aims to provide data for subsequent model training.
The voice database comprises the voice data of the call of the identified party; identity information of the identified party (e.g., phone number, name, location); called party environmental characteristics (e.g., indoor, street, store, etc.); recording equipment information (such as sampling frequency, noise reduction characteristics, audio storage format and the like) by a testing party; environmental characteristics of the testing party (e.g., indoor, street, store, etc.); the call duration; the conversation time; and (4) the volume of the call.
And S202, training the voice of the communication person by using the attention-based neural network to generate a training model.
Specifically, voice files in a voice library are preprocessed, and a wiener filter is used for simple voice noise reduction, so that the influence of noise on the whole experiment process is avoided.
The neural network used is based on a time-recursive neural network of the attention mechanism, the network structure being shown in fig. 4.
S203: and in the test process, the test party records the call process and stores the recording file.
In particular, the saved audio file format is required to be lossless, such as WAVE, FLAC, APE, ALAC, WavPack, and the like.
Wherein WAVE typically uses three parameters to represent sound, quantization bits, sampling frequency and sampling point amplitude. The quantization digit is divided into 8 digits, 16 digits and 24 digits, the sound channel has a single sound channel and a stereo sound channel, the single sound channel amplitude data is n x 1 matrix points, the stereo sound is n x 2 matrix points, the sampling frequency is generally 11025Hz (11kHz), 22050Hz (22kHz) and 44100Hz (44kHz), the sound quality is excellent, but the file volume is larger. And recording the coding mode of the audio file.
S204: and inputting the recording file into the trained model to identify the tested speaker.
Specifically, the stored audio file is preprocessed, subjected to noise reduction processing on the speech signal through wiener filtering, then intercepted into speech segments with the same length, for example, 10 seconds, and each speech segment is input into a trained model after Mel frequency cepstrum coefficient features are extracted, and classified, wherein the neural network model is shown in fig. 4.
The neural network structure includes:
s301, a feature input layer, which is used for performing feature engineering on the voice in the communication voice library and extracting required features, wherein the Mel cepstrum coefficient of the voice in the voice library is extracted as a feature vector;
s302 is a full connection layer;
s303 is a time recurrent neural network layer based on attention;
normalizing the result of the exponential function layer calculation in S304;
and S305, an output layer for converting the codes and the names of the persons, and outputting the result calculated by the normalization index function layer in the S304.
The embodiment can only identify the existing tested person, namely the existing tested person in the speaker voice library, and if the audio file to be identified belongs to the new tested person, the audio file to be identified can be identified as the closest existing tested person.
In the embodiment 2 provided by the invention, the tested caller voice is trained by adopting the attention-based neural network to obtain the training model, the tested caller is identified by adopting the training model, the consistency of the master corresponding to the dialing number is confirmed, and the potential safety hazard that the identity of the caller is counterfeited is avoided.
Example 3
Embodiment 3 of the present invention further provides a device for identifying a talker based on an attention mechanism, as shown in fig. 3.
And the collection module 10 is used for constructing a tested caller voice library corresponding to the identity of the tested caller and the audio file. The collecting module 10 can be further divided into a telephone recording module 11 and a database processing module 12.
And the telephone recording module 11 is used for recording the call process, and the specific implementation method is that the tester produces a call with the tested caller with or without a plan, records the call process, and stores the recording file into an audio file in a Wave format.
The testing party uses the built-in recording equipment of the smart phone in the call process or uses the earphone with the recording function to record the voice of the tested call person in the call process.
The android mobile phone can use the call recording function of the system in the call process, and the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics (such as quiet and noisy) in the call process need to be determined; and the apple mobile phone does not provide the conversation recording function because the privacy setting system, can record through the earphone that has the recording function, needs make clear the brand of earphone, the model of earphone, recording file save format, external environment characteristic (such as quiet, noisy).
When the testing party is a calling party, a built-in recording device or an earphone with a recording function of the smart phone can be opened after the called party answers; when the testing party is a called party, the built-in recording equipment of the smart phone or the earphone with the recording function can be opened when the calling party answers the calling.
And the database processing module 12 is used for associating the audio file with the identity of the tested person.
The testing party stores the collected audio file, the identity of the tested caller and the related configuration information of the audio file into a database. The configuration information contains the identity information (such as telephone number, name and location) of the identified party; called party environmental characteristics (e.g., indoor, street, store, etc.); recording equipment information (such as sampling frequency, noise reduction characteristics, audio storage format and the like) by a tester; environmental characteristics of the testing party (e.g., indoor, street, store, etc.); the call duration; the conversation time; information of the call volume.
And the training module 20 is used for training the audio files in the voice library of the tested person. The network architecture model used is shown in fig. 4.
Specifically, before that, the noise reduction processing is performed on the audio file and the features are extracted. The noise reduction processing uses wiener filtering, and the characteristic uses the Mel cepstrum coefficient of the audio file. An audio processing module Python _ speed _ features provided by Python may be used here.
The testing module 30 is used to identify the speaker to which the new audio file belongs.
In the testing process, the testing party still records the call process of the tested caller through the recording method. The audio file is preprocessed, i.e. after the noise reduction processing and the feature extraction in the training module 20, the audio file is transmitted into the trained model, and the output result of the model is received.
Compared with the prior art, the invention has the following technical effects:
the invention discloses a speaker recognition method and a speaker recognition device based on an attention mechanism, wherein a neural network based on attention is adopted to train the voice of a tested speaker to obtain a training model, the tested speaker is recognized by the training model, the consistency of a master corresponding to a dialing number is confirmed, the potential communication safety hazard caused by the imitation of the identity of the speaker is avoided, and the information security in the communication process is further improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the implementation manner of the present invention are explained by applying specific examples, the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof, the described embodiments are only a part of the embodiments of the present invention, not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention.
Claims (7)
1. The method for identifying the talker based on the attention mechanism is characterized by comprising the following steps of:
collecting the call records of a plurality of tested callers and the call records of the tested callers;
establishing a speaker voice library according to the call record corresponding to the tested speaker;
training a neural network based on an attention mechanism by adopting the call record of the tested caller to obtain a training model;
storing the call records of the test caller and the tested caller to obtain a record file;
adopting the training model to identify whether the tested speaker is a target speaker according to the recording file,
wherein, adopt the conversation recording of the person who is tested the neural network based on the attention mechanism trains, obtains the training model, includes:
denoising the call recording of the tested caller by adopting a wiener filter to obtain a preprocessed recording file;
extracting voice features from the preprocessed sound recording file through an input layer of a time recursive neural network to obtain a Mel cepstrum coefficient feature vector of voice in the preprocessed sound recording file;
sending the Mel cepstrum coefficient feature vector to a full connection layer, and performing feature extraction on the Mel cepstrum coefficient feature vector by the full connection layer to obtain a second feature vector of the voice in the preprocessed audio file;
sending the second feature vector to an attention-based time-recursive neural network layer, wherein the attention-based time-recursive neural network layer comprises a plurality of LSTM layers, and the second feature vector is processed through the plurality of LSTM layers to obtain processed data;
and sending the processing data to a normalization index function layer, wherein the normalization index function layer correspondingly converts the processing data and the name of the person to obtain the name of the person corresponding to the processing data.
2. The method of claim 1, wherein collecting call records of a plurality of tested callers and a test caller comprises:
the testing party records the voice of the tested caller in the call process by using the built-in recording function of the smart phone in the call process; the call recording function of the system is used in the call process, and the model of the mobile phone equipment, the storage format of the recording file and the external environment characteristics during the call need to be determined; and saving the call record in a lossless file form in Wave format.
3. The method for identifying a caller according to claim 1, wherein the establishing a caller voice library according to the call record corresponding to the tested caller comprises:
acquiring the corresponding relation between the identity of the tested caller and the tested voice;
and establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the tested party, the environmental characteristics of the tested party, the call duration, the call time and the call volume.
4. The method for identifying a caller based on an attention mechanism according to claim 1, wherein identifying whether the tested caller is a target caller by using the training model according to the recording file comprises:
judging whether the voice to be tested exists in the speaker voice library, and if so, identifying the tested person existing in the speaker voice library; otherwise, if the audio file to be identified belongs to the new tester, the existing tester is identified as the closest.
5. A speaker recognition device based on an attention mechanism, comprising:
the collecting module is used for collecting the call records of a plurality of tested callers and the call records of the tested callers;
the voice database establishing module is used for establishing a speaker voice database according to the call record corresponding to the tested speaker;
the training module is used for training the neural network based on the attention mechanism by adopting the call record of the tested caller to obtain a training model;
the file storage module is used for storing the call records of the test caller and the tested caller to obtain a record file;
the test module is used for identifying whether the tested speaker is a target speaker or not by adopting the training model according to the recording file,
wherein the training module is configured to:
denoising the call recording of the tested caller by adopting a wiener filter to obtain a preprocessed recording file;
extracting voice features from the preprocessed sound recording file through an input layer of a time recursive neural network to obtain a Mel cepstrum coefficient feature vector of voice in the preprocessed sound recording file;
sending the Mel cepstrum coefficient feature vector to a full connection layer, and performing feature extraction on the Mel cepstrum coefficient feature vector by the full connection layer to obtain a second feature vector of the voice in the preprocessed audio file;
sending the second feature vector to an attention-based time-recursive neural network layer, wherein the attention-based time-recursive neural network layer comprises a plurality of LSTM layers, and the second feature vector is processed through the plurality of LSTM layers to obtain processed data;
and sending the processing data to a normalization index function layer, wherein the normalization index function layer correspondingly converts the processing data and the name of the person to obtain the name of the person corresponding to the processing data.
6. The apparatus for identifying a human speaker based on attention mechanism as claimed in claim 5, wherein the collecting module comprises:
the testing party unit is used for recording the voice of the tested caller in the calling process by using the built-in recording function of the smart phone in the calling process by the testing party;
the recording unit is used for using the call recording function of the system in the call process and determining the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics during the call; and saving the call record in a lossless file form in Wave format.
7. The apparatus for identifying a caller according to claim 5, wherein the speech library creating module comprises:
the corresponding relation acquisition unit is used for acquiring the corresponding relation between the identity of the tested caller and the tested voice;
and the voice library establishing unit is used for establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the tested party, the environmental characteristics of the tested party, the call duration, the call time and the call volume.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910684343.7A CN110556114B (en) | 2019-07-26 | 2019-07-26 | Speaker identification method and device based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910684343.7A CN110556114B (en) | 2019-07-26 | 2019-07-26 | Speaker identification method and device based on attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110556114A CN110556114A (en) | 2019-12-10 |
CN110556114B true CN110556114B (en) | 2022-06-17 |
Family
ID=68736524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910684343.7A Expired - Fee Related CN110556114B (en) | 2019-07-26 | 2019-07-26 | Speaker identification method and device based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110556114B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111785287B (en) * | 2020-07-06 | 2022-06-07 | 北京世纪好未来教育科技有限公司 | Speaker recognition method, speaker recognition device, electronic equipment and storage medium |
CN114040052B (en) * | 2021-11-01 | 2024-01-19 | 江苏号百信息服务有限公司 | Method for identifying audio collection and effective audio screening of telephone voiceprint |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108269569A (en) * | 2017-01-04 | 2018-07-10 | 三星电子株式会社 | Audio recognition method and equipment |
US20180308487A1 (en) * | 2017-04-21 | 2018-10-25 | Go-Vivace Inc. | Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response |
US20180374486A1 (en) * | 2017-06-23 | 2018-12-27 | Microsoft Technology Licensing, Llc | Speaker recognition |
CN109155132A (en) * | 2016-03-21 | 2019-01-04 | 亚马逊技术公司 | Speaker verification method and system |
CN109215662A (en) * | 2018-09-18 | 2019-01-15 | 平安科技(深圳)有限公司 | End-to-end audio recognition method, electronic device and computer readable storage medium |
CN109256135A (en) * | 2018-08-28 | 2019-01-22 | 桂林电子科技大学 | A kind of end-to-end method for identifying speaker, device and storage medium |
CN109637545A (en) * | 2019-01-17 | 2019-04-16 | 哈尔滨工程大学 | Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term |
CN109801635A (en) * | 2019-01-31 | 2019-05-24 | 北京声智科技有限公司 | A kind of vocal print feature extracting method and device based on attention mechanism |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100454942C (en) * | 2004-06-25 | 2009-01-21 | 联想(北京)有限公司 | Realizing system and method for multimode communication of mobile terminal |
CN101848277A (en) * | 2010-04-23 | 2010-09-29 | 中兴通讯股份有限公司 | Mobile terminal and method for storing conversation contents in real time |
CN103391347B (en) * | 2012-05-10 | 2018-06-08 | 中兴通讯股份有限公司 | A kind of method and device of automatic recording |
CN103167371A (en) * | 2013-04-09 | 2013-06-19 | 北京兴科迪科技有限公司 | Bluetooth headset with record storage function and vehicle with same |
CN104580647B (en) * | 2014-12-31 | 2018-11-06 | 惠州Tcl移动通信有限公司 | A kind of caching method and communication device of calling record |
CN205961381U (en) * | 2016-07-20 | 2017-02-15 | 深圳唯创知音电子有限公司 | Recording earphone |
US20180330718A1 (en) * | 2017-05-11 | 2018-11-15 | Mitsubishi Electric Research Laboratories, Inc. | System and Method for End-to-End speech recognition |
CN107580102A (en) * | 2017-08-22 | 2018-01-12 | 深圳传音控股有限公司 | Earphone and the method for earphone recording |
CN107993663A (en) * | 2017-09-11 | 2018-05-04 | 北京航空航天大学 | A kind of method for recognizing sound-groove based on Android |
CN109040444B (en) * | 2018-07-27 | 2020-08-14 | 维沃移动通信有限公司 | Call recording method, terminal and computer readable storage medium |
CN109979429A (en) * | 2019-05-29 | 2019-07-05 | 南京硅基智能科技有限公司 | A kind of method and system of TTS |
-
2019
- 2019-07-26 CN CN201910684343.7A patent/CN110556114B/en not_active Expired - Fee Related
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109155132A (en) * | 2016-03-21 | 2019-01-04 | 亚马逊技术公司 | Speaker verification method and system |
CN108269569A (en) * | 2017-01-04 | 2018-07-10 | 三星电子株式会社 | Audio recognition method and equipment |
US20180308487A1 (en) * | 2017-04-21 | 2018-10-25 | Go-Vivace Inc. | Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response |
US20180374486A1 (en) * | 2017-06-23 | 2018-12-27 | Microsoft Technology Licensing, Llc | Speaker recognition |
CN109256135A (en) * | 2018-08-28 | 2019-01-22 | 桂林电子科技大学 | A kind of end-to-end method for identifying speaker, device and storage medium |
CN109215662A (en) * | 2018-09-18 | 2019-01-15 | 平安科技(深圳)有限公司 | End-to-end audio recognition method, electronic device and computer readable storage medium |
CN109637545A (en) * | 2019-01-17 | 2019-04-16 | 哈尔滨工程大学 | Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term |
CN109801635A (en) * | 2019-01-31 | 2019-05-24 | 北京声智科技有限公司 | A kind of vocal print feature extracting method and device based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
End-to-End Attention based Text-Dependent Speaker Verification;Shi-Xiong Zhang et al.;《Spoken Language Technology Workshop》;20170209;第171-178页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110556114A (en) | 2019-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110136727B (en) | Speaker identification method, device and storage medium based on speaking content | |
CN107274916B (en) | Method and device for operating audio/video file based on voiceprint information | |
CN111128223B (en) | Text information-based auxiliary speaker separation method and related device | |
CN108877823B (en) | Speech enhancement method and device | |
CN104766608A (en) | Voice control method and voice control device | |
CN104485102A (en) | Voiceprint recognition method and device | |
CN108010513B (en) | Voice processing method and device | |
CN113488024B (en) | Telephone interrupt recognition method and system based on semantic recognition | |
CN113823293B (en) | Speaker recognition method and system based on voice enhancement | |
CN111145763A (en) | GRU-based voice recognition method and system in audio | |
CN108848507A (en) | A kind of bad telecommunication user information collecting method | |
CN109829691B (en) | C/S card punching method and device based on position and deep learning multiple biological features | |
CN110556114B (en) | Speaker identification method and device based on attention mechanism | |
CN110517697A (en) | Prompt tone intelligence cutting-off device for interactive voice response | |
CN107705791A (en) | Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition | |
CN105679323B (en) | A kind of number discovery method and system | |
CN113744742B (en) | Role identification method, device and system under dialogue scene | |
CN110600032A (en) | Voice recognition method and device | |
CN111627448A (en) | System and method for realizing trial and talk control based on voice big data | |
CN108665901B (en) | Phoneme/syllable extraction method and device | |
CN109273012B (en) | Identity authentication method based on speaker recognition and digital voice recognition | |
CN109817223A (en) | Phoneme marking method and device based on audio fingerprints | |
CN113409774A (en) | Voice recognition method and device and electronic equipment | |
Zou et al. | Automatic cell phone recognition from speech recordings | |
CN111986680A (en) | Method and device for evaluating spoken language of object, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20220617 |