CN110556114B - Speaker identification method and device based on attention mechanism - Google Patents

Speaker identification method and device based on attention mechanism Download PDF

Info

Publication number
CN110556114B
CN110556114B CN201910684343.7A CN201910684343A CN110556114B CN 110556114 B CN110556114 B CN 110556114B CN 201910684343 A CN201910684343 A CN 201910684343A CN 110556114 B CN110556114 B CN 110556114B
Authority
CN
China
Prior art keywords
tested
call
voice
caller
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910684343.7A
Other languages
Chinese (zh)
Other versions
CN110556114A (en
Inventor
林格平
戚梦苑
沈亮
李娅强
刘发强
孙旭东
孙晓晨
宁珊
蔡文强
王玉龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
National Computer Network and Information Security Management Center
Original Assignee
Beijing University of Posts and Telecommunications
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications, National Computer Network and Information Security Management Center filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910684343.7A priority Critical patent/CN110556114B/en
Publication of CN110556114A publication Critical patent/CN110556114A/en
Application granted granted Critical
Publication of CN110556114B publication Critical patent/CN110556114B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/656Recording arrangements for recording a message from the calling party for recording conversations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Environmental & Geological Engineering (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a speaker identification method and a speaker identification device based on an attention mechanism, which comprise the following steps: collecting the call records of a plurality of tested callers and the call records of the tested callers; establishing a speaker voice library according to the call record corresponding to the tested speaker; training the voice of the tested talking person by adopting a neural network based on attention to obtain a training model; storing the call record of the test caller to obtain a record file; and identifying whether the tested speaker is the target speaker or not by using the recording file by adopting the training model. The voice of the tested caller is trained by adopting the attention-based neural network to obtain a training model, the tested caller is identified by adopting the training model, the consistency of the master corresponding to the dialing number is confirmed, the potential communication safety hazard caused by the imitation of the identity of the caller is avoided, and the information safety in the calling process is further improved.

Description

Speaker identification method and device based on attention mechanism
Technical Field
The invention relates to the field of voice recognition, in particular to a speaker recognition method and device based on an attention mechanism.
Background
The voice is the most direct and the most main communication mode in human production and life, and the human voice contains semantic information, language or dialect information, channel information and the like. With the continuous progress and development of computer technology and the coming of the network era, more and more measures are provided to disguise the identity of the speaker.
Determining the identity of the speaker during the communication is advantageous for determining the security of the communication. The speaker recognition methods in the prior art include using an approximate value of KL divergence degrees as a measure of similarity between speakers, using a BP neural network to recognize speakers, using a mixed feature of MFCC and GFCC to recognize speakers, and the like. However, most application scenarios of the technology are based on smart home, such as a sweeping robot, which recognizes whether a speaker is the owner of the speaker according to voice. In the prior art, a speaker recognition method in the communication process is not provided, and the silence process of voice in the communication process is not considered.
Most of voice recognition methods in the prior art are suitable for a scene of recognizing voice signals directly acquired from the outside, a voice recognition method for a speaker in a communication process is lacked, and potential safety hazards that the identity of the speaker can be counterfeited exist.
Disclosure of Invention
The invention aims to provide a speaker identification method and device based on an attention mechanism to solve the technical problems.
In order to achieve the purpose, the invention provides the following scheme:
in a first aspect of the embodiments of the present invention, there is provided a method for identifying a speaker based on an attention mechanism, including the following steps:
collecting the call records of a plurality of tested callers and the call records of the tested callers;
establishing a speaker voice library according to the call record corresponding to the tested speaker;
training the voice of the tested talking person by adopting an attention-based neural network to obtain a training model;
storing the call record of the test caller to obtain a record file;
and identifying whether the tested speaker is the target speaker or not by using the recording file through a training model.
Optionally, the step of collecting the call records of a plurality of tested callers and the call records of the tested callers comprises:
the testing party records the voice of the tested caller in the call process by using the built-in recording function of the smart phone in the call process; the call recording function of the system is used in the call process, and the model of the mobile phone equipment, the storage format of the recording file and the external environment characteristics during the call need to be determined; and saving the call record in a lossless file form in Wave format.
Optionally, the step of establishing a speaker voice library according to the call record corresponding to the tested speaker includes:
acquiring the corresponding relation between the identity of the tested caller and the tested voice;
and establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the testing party, the environmental characteristics of the testing party, the communication time, the communication volume.
Optionally, the step of training the voice of the tested caller by using an attention-based neural network to obtain a training model includes:
denoising the sound recording file by adopting a wiener filter to obtain a preprocessed sound recording file;
and training the preprocessed sound recording file by adopting a time recurrent neural network based on an attention mechanism to obtain a training model.
Optionally, the step of training the preprocessed audio file by using a time recurrent neural network based on an attention mechanism to obtain a training model includes:
extracting voice features of the preprocessed audio file through an input layer of a time recursive neural network to obtain Mel cepstrum coefficient feature vectors of voices in the preprocessed audio file;
sending the Mel cepstrum coefficient feature vector to a full connection layer, and performing feature extraction on the Mel cepstrum coefficient feature vector by the full connection layer (which can be regarded as a self-encoder) to obtain a second feature vector of the voice in the preprocessed audio file;
sending the second feature vector to an attention-based time-recursive neural network layer, wherein the attention-based time-recursive neural network layer comprises a plurality of LSTM layers, and processing the second feature vector through the plurality of LSTM layers to obtain processed data;
and sending the processed data to a normalization index function layer, and correspondingly converting the processed data and the name of the person by the normalization index function layer to obtain the name of the person corresponding to the processed data.
Optionally, the step of identifying whether the tested speaker is the target speaker by using the training model for the sound recording file includes:
judging whether the voice to be tested exists in the speaker voice library, and if so, identifying the tested person existing in the speaker voice library; otherwise, if the audio file to be recognized belongs to the new tester, the existing tester is recognized as the closest existing tester, namely the existing classification with the maximum confidence level value.
In order to achieve the above object, the present invention further provides the following solutions:
a speaker recognition apparatus based on an attention mechanism, comprising:
the collection module is used for collecting the call records of a plurality of tested callers and the call records of the tested callers;
the voice database establishing module is used for establishing a voice database of the caller according to the call record corresponding to the tested caller;
the training module is used for training the voice of the tested talking person by adopting a neural network based on attention to obtain a training model;
the file storage module is used for storing the call record of the test caller to obtain a record file;
and the test module is used for identifying whether the tested speaker is the target speaker or not by using the recording file through the training model.
Optionally, the collecting module specifically includes:
the testing party unit is used for recording the voice of the tested caller in the calling process by using the built-in recording function of the smart phone in the calling process by the testing party;
the recording unit is used for using the call recording function of the system in the call process and determining the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics during the call; and saving the call record in a lossless file form in Wave format.
Optionally, the voice library establishing module specifically includes:
the corresponding relation obtaining unit is used for obtaining the corresponding relation between the identity of the tested caller and the tested voice;
and the voice library establishing unit is used for establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the tested party, the environmental characteristics of the tested party, the communication time, the communication volume.
Optionally, the training module specifically includes:
the preprocessing unit is used for denoising the sound recording file by adopting a wiener filter to obtain a preprocessed sound recording file;
and the training model establishing unit is used for training the preprocessed sound recording file by adopting a time recurrent neural network based on an attention mechanism to obtain a training model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention discloses a speaker recognition method and a device based on an attention mechanism, which train the voice of a tested speaker by adopting a neural network based on attention to obtain a training model, recognize the tested speaker by adopting the training model, and can determine the speaker in a communication process by utilizing an existing communication voice library under the condition of only audio frequency so that a user can display the speaker to be matched with an actual speaker through a number, thereby judging the reliability, effectively defending acts such as fraud and the like by imitating the voice of the speaker, and indirectly protecting the communication safety of the user.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
Fig. 1 is a schematic flowchart of a method for identifying a caller based on an attention mechanism according to an embodiment 1 of the present invention;
FIG. 2 is a flowchart of a speaker identification method based on an attention mechanism according to embodiment 2 of the present invention;
fig. 3 is a schematic structural diagram of a device for identifying a caller based on an attention mechanism according to embodiment 3 of the present invention;
fig. 4 is a diagram of a neural network structure provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the invention without any inventive step, are within the scope of protection of the invention.
Example 1
Embodiment 1 of the present invention provides an embodiment of a speaker recognition method based on an attention mechanism, and as shown in fig. 1, the method includes the following steps:
s101: collecting the call records of a plurality of tested callers and the call records of the tested callers;
s102: establishing a caller voice library according to the call record corresponding to the tested caller;
s103: training the voice of the tested talking person by adopting an attention-based neural network to obtain a training model;
s104: storing the call record of the test caller to obtain a record file;
s105: and identifying whether the tested speaker is the target speaker or not by using the recording file through a training model.
In the answering process, the testing party records the call, and then the call record corresponds to the tested caller to construct a voice library of the tested caller;
learning the voice characteristics of the speaker by using an LSTM neural network based on attention to generate a speaker recognition model;
when the caller is identified, the testing party records the call process and stores the recording file into a wave audio file;
and (3) the stored audio file passes through a wiener filter, the Mel cepstrum coefficient is extracted, and then the extracted audio file is input into a trained speaker recognition model to recognize the tested speaker.
Wherein the use of an attention-based temporal recurrent neural network can be directly implemented with the open source tool tensorflow.
The network parameters used in the embodiment of the invention are as follows: rnn _ size is any real number, such as 64,
Figure BDA0002145726080000051
attn _ length is any real number, such as 64. It is emphasized that the core of the present invention is the speaker identification method based on attention mechanism, and the operations of modifying network parameters and the like for the network are all included in the present invention.
The speaker recognition based on the attention mechanism provided by the embodiment of the invention can determine the speaking in the communication process by utilizing the existing communication voice library under the condition of only audio frequency, so that the user can conveniently display the matching between the speaker and the actual speaker through the number, thereby judging the reliability and indirectly protecting the communication safety of the user.
Example 2
Embodiment 2 of the present invention provides a preferred embodiment of a speaker recognition method based on the attention mechanism. Referring to fig. 2, in this embodiment, the method includes the steps of:
s201: and collecting the call records of the users, and corresponding the call records to the tested caller to construct a caller voice library.
The testing party uses the built-in recording equipment of the smart phone in the call process or uses the earphone with the recording function to record the voice of the tested call person in the call process.
The android mobile phone can use the call recording function of the system in the call process, and the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics (such as quiet and noisy) in the call process need to be determined; and the apple mobile phone does not provide the conversation recording function because the privacy setting system, can record through the earphone that has the recording function, needs make clear the brand of earphone, the model of earphone, recording file save format, external environment characteristic (such as quiet, noisy).
When the testing party is the calling party, the built-in recording equipment of the smart phone or an earphone with a recording function can be opened after the called party answers; when the testing party is a called party, the built-in recording equipment of the smart phone or the earphone with the recording function can be opened when the calling party answers the calling.
The voice library is a correlation library of the identity of the tested speaker and the tested voice, and aims to provide data for subsequent model training.
The voice database comprises the voice data of the call of the identified party; identity information of the identified party (e.g., phone number, name, location); called party environmental characteristics (e.g., indoor, street, store, etc.); recording equipment information (such as sampling frequency, noise reduction characteristics, audio storage format and the like) by a testing party; environmental characteristics of the testing party (e.g., indoor, street, store, etc.); the call duration; the conversation time; and (4) the volume of the call.
And S202, training the voice of the communication person by using the attention-based neural network to generate a training model.
Specifically, voice files in a voice library are preprocessed, and a wiener filter is used for simple voice noise reduction, so that the influence of noise on the whole experiment process is avoided.
The neural network used is based on a time-recursive neural network of the attention mechanism, the network structure being shown in fig. 4.
S203: and in the test process, the test party records the call process and stores the recording file.
In particular, the saved audio file format is required to be lossless, such as WAVE, FLAC, APE, ALAC, WavPack, and the like.
Wherein WAVE typically uses three parameters to represent sound, quantization bits, sampling frequency and sampling point amplitude. The quantization digit is divided into 8 digits, 16 digits and 24 digits, the sound channel has a single sound channel and a stereo sound channel, the single sound channel amplitude data is n x 1 matrix points, the stereo sound is n x 2 matrix points, the sampling frequency is generally 11025Hz (11kHz), 22050Hz (22kHz) and 44100Hz (44kHz), the sound quality is excellent, but the file volume is larger. And recording the coding mode of the audio file.
S204: and inputting the recording file into the trained model to identify the tested speaker.
Specifically, the stored audio file is preprocessed, subjected to noise reduction processing on the speech signal through wiener filtering, then intercepted into speech segments with the same length, for example, 10 seconds, and each speech segment is input into a trained model after Mel frequency cepstrum coefficient features are extracted, and classified, wherein the neural network model is shown in fig. 4.
The neural network structure includes:
s301, a feature input layer, which is used for performing feature engineering on the voice in the communication voice library and extracting required features, wherein the Mel cepstrum coefficient of the voice in the voice library is extracted as a feature vector;
s302 is a full connection layer;
s303 is a time recurrent neural network layer based on attention;
normalizing the result of the exponential function layer calculation in S304;
and S305, an output layer for converting the codes and the names of the persons, and outputting the result calculated by the normalization index function layer in the S304.
The embodiment can only identify the existing tested person, namely the existing tested person in the speaker voice library, and if the audio file to be identified belongs to the new tested person, the audio file to be identified can be identified as the closest existing tested person.
In the embodiment 2 provided by the invention, the tested caller voice is trained by adopting the attention-based neural network to obtain the training model, the tested caller is identified by adopting the training model, the consistency of the master corresponding to the dialing number is confirmed, and the potential safety hazard that the identity of the caller is counterfeited is avoided.
Example 3
Embodiment 3 of the present invention further provides a device for identifying a talker based on an attention mechanism, as shown in fig. 3.
And the collection module 10 is used for constructing a tested caller voice library corresponding to the identity of the tested caller and the audio file. The collecting module 10 can be further divided into a telephone recording module 11 and a database processing module 12.
And the telephone recording module 11 is used for recording the call process, and the specific implementation method is that the tester produces a call with the tested caller with or without a plan, records the call process, and stores the recording file into an audio file in a Wave format.
The testing party uses the built-in recording equipment of the smart phone in the call process or uses the earphone with the recording function to record the voice of the tested call person in the call process.
The android mobile phone can use the call recording function of the system in the call process, and the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics (such as quiet and noisy) in the call process need to be determined; and the apple mobile phone does not provide the conversation recording function because the privacy setting system, can record through the earphone that has the recording function, needs make clear the brand of earphone, the model of earphone, recording file save format, external environment characteristic (such as quiet, noisy).
When the testing party is a calling party, a built-in recording device or an earphone with a recording function of the smart phone can be opened after the called party answers; when the testing party is a called party, the built-in recording equipment of the smart phone or the earphone with the recording function can be opened when the calling party answers the calling.
And the database processing module 12 is used for associating the audio file with the identity of the tested person.
The testing party stores the collected audio file, the identity of the tested caller and the related configuration information of the audio file into a database. The configuration information contains the identity information (such as telephone number, name and location) of the identified party; called party environmental characteristics (e.g., indoor, street, store, etc.); recording equipment information (such as sampling frequency, noise reduction characteristics, audio storage format and the like) by a tester; environmental characteristics of the testing party (e.g., indoor, street, store, etc.); the call duration; the conversation time; information of the call volume.
And the training module 20 is used for training the audio files in the voice library of the tested person. The network architecture model used is shown in fig. 4.
Specifically, before that, the noise reduction processing is performed on the audio file and the features are extracted. The noise reduction processing uses wiener filtering, and the characteristic uses the Mel cepstrum coefficient of the audio file. An audio processing module Python _ speed _ features provided by Python may be used here.
The testing module 30 is used to identify the speaker to which the new audio file belongs.
In the testing process, the testing party still records the call process of the tested caller through the recording method. The audio file is preprocessed, i.e. after the noise reduction processing and the feature extraction in the training module 20, the audio file is transmitted into the trained model, and the output result of the model is received.
Test module 30 can only identify the existing testees in trainer module 20, i.e., the existing testees in the speaker voice library. If the audio file to be identified in the test module 30 belongs to the new tester, the audio file is identified as the closest existing tested person, and the closest existing tested person is the tested person with the existing classification with the highest confidence value.
Compared with the prior art, the invention has the following technical effects:
the invention discloses a speaker recognition method and a speaker recognition device based on an attention mechanism, wherein a neural network based on attention is adopted to train the voice of a tested speaker to obtain a training model, the tested speaker is recognized by the training model, the consistency of a master corresponding to a dialing number is confirmed, the potential communication safety hazard caused by the imitation of the identity of the speaker is avoided, and the information security in the communication process is further improved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principle and the implementation manner of the present invention are explained by applying specific examples, the above description of the embodiments is only used to help understanding the method of the present invention and the core idea thereof, the described embodiments are only a part of the embodiments of the present invention, not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention.

Claims (7)

1. The method for identifying the talker based on the attention mechanism is characterized by comprising the following steps of:
collecting the call records of a plurality of tested callers and the call records of the tested callers;
establishing a speaker voice library according to the call record corresponding to the tested speaker;
training a neural network based on an attention mechanism by adopting the call record of the tested caller to obtain a training model;
storing the call records of the test caller and the tested caller to obtain a record file;
adopting the training model to identify whether the tested speaker is a target speaker according to the recording file,
wherein, adopt the conversation recording of the person who is tested the neural network based on the attention mechanism trains, obtains the training model, includes:
denoising the call recording of the tested caller by adopting a wiener filter to obtain a preprocessed recording file;
extracting voice features from the preprocessed sound recording file through an input layer of a time recursive neural network to obtain a Mel cepstrum coefficient feature vector of voice in the preprocessed sound recording file;
sending the Mel cepstrum coefficient feature vector to a full connection layer, and performing feature extraction on the Mel cepstrum coefficient feature vector by the full connection layer to obtain a second feature vector of the voice in the preprocessed audio file;
sending the second feature vector to an attention-based time-recursive neural network layer, wherein the attention-based time-recursive neural network layer comprises a plurality of LSTM layers, and the second feature vector is processed through the plurality of LSTM layers to obtain processed data;
and sending the processing data to a normalization index function layer, wherein the normalization index function layer correspondingly converts the processing data and the name of the person to obtain the name of the person corresponding to the processing data.
2. The method of claim 1, wherein collecting call records of a plurality of tested callers and a test caller comprises:
the testing party records the voice of the tested caller in the call process by using the built-in recording function of the smart phone in the call process; the call recording function of the system is used in the call process, and the model of the mobile phone equipment, the storage format of the recording file and the external environment characteristics during the call need to be determined; and saving the call record in a lossless file form in Wave format.
3. The method for identifying a caller according to claim 1, wherein the establishing a caller voice library according to the call record corresponding to the tested caller comprises:
acquiring the corresponding relation between the identity of the tested caller and the tested voice;
and establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the tested party, the environmental characteristics of the tested party, the call duration, the call time and the call volume.
4. The method for identifying a caller based on an attention mechanism according to claim 1, wherein identifying whether the tested caller is a target caller by using the training model according to the recording file comprises:
judging whether the voice to be tested exists in the speaker voice library, and if so, identifying the tested person existing in the speaker voice library; otherwise, if the audio file to be identified belongs to the new tester, the existing tester is identified as the closest.
5. A speaker recognition device based on an attention mechanism, comprising:
the collecting module is used for collecting the call records of a plurality of tested callers and the call records of the tested callers;
the voice database establishing module is used for establishing a speaker voice database according to the call record corresponding to the tested speaker;
the training module is used for training the neural network based on the attention mechanism by adopting the call record of the tested caller to obtain a training model;
the file storage module is used for storing the call records of the test caller and the tested caller to obtain a record file;
the test module is used for identifying whether the tested speaker is a target speaker or not by adopting the training model according to the recording file,
wherein the training module is configured to:
denoising the call recording of the tested caller by adopting a wiener filter to obtain a preprocessed recording file;
extracting voice features from the preprocessed sound recording file through an input layer of a time recursive neural network to obtain a Mel cepstrum coefficient feature vector of voice in the preprocessed sound recording file;
sending the Mel cepstrum coefficient feature vector to a full connection layer, and performing feature extraction on the Mel cepstrum coefficient feature vector by the full connection layer to obtain a second feature vector of the voice in the preprocessed audio file;
sending the second feature vector to an attention-based time-recursive neural network layer, wherein the attention-based time-recursive neural network layer comprises a plurality of LSTM layers, and the second feature vector is processed through the plurality of LSTM layers to obtain processed data;
and sending the processing data to a normalization index function layer, wherein the normalization index function layer correspondingly converts the processing data and the name of the person to obtain the name of the person corresponding to the processing data.
6. The apparatus for identifying a human speaker based on attention mechanism as claimed in claim 5, wherein the collecting module comprises:
the testing party unit is used for recording the voice of the tested caller in the calling process by using the built-in recording function of the smart phone in the calling process by the testing party;
the recording unit is used for using the call recording function of the system in the call process and determining the model of the mobile phone equipment, the storage format of a recording file and the external environment characteristics during the call; and saving the call record in a lossless file form in Wave format.
7. The apparatus for identifying a caller according to claim 5, wherein the speech library creating module comprises:
the corresponding relation acquisition unit is used for acquiring the corresponding relation between the identity of the tested caller and the tested voice;
and the voice library establishing unit is used for establishing a speaker voice library according to the corresponding relation, wherein the speaker voice library comprises the voice frequency data of the identified party, the identity information of the tested party, the environmental characteristics of the tested party, the recording equipment information of the tested party, the environmental characteristics of the tested party, the call duration, the call time and the call volume.
CN201910684343.7A 2019-07-26 2019-07-26 Speaker identification method and device based on attention mechanism Expired - Fee Related CN110556114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910684343.7A CN110556114B (en) 2019-07-26 2019-07-26 Speaker identification method and device based on attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910684343.7A CN110556114B (en) 2019-07-26 2019-07-26 Speaker identification method and device based on attention mechanism

Publications (2)

Publication Number Publication Date
CN110556114A CN110556114A (en) 2019-12-10
CN110556114B true CN110556114B (en) 2022-06-17

Family

ID=68736524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910684343.7A Expired - Fee Related CN110556114B (en) 2019-07-26 2019-07-26 Speaker identification method and device based on attention mechanism

Country Status (1)

Country Link
CN (1) CN110556114B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785287B (en) * 2020-07-06 2022-06-07 北京世纪好未来教育科技有限公司 Speaker recognition method, speaker recognition device, electronic equipment and storage medium
CN114040052B (en) * 2021-11-01 2024-01-19 江苏号百信息服务有限公司 Method for identifying audio collection and effective audio screening of telephone voiceprint

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108269569A (en) * 2017-01-04 2018-07-10 三星电子株式会社 Audio recognition method and equipment
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
US20180374486A1 (en) * 2017-06-23 2018-12-27 Microsoft Technology Licensing, Llc Speaker recognition
CN109155132A (en) * 2016-03-21 2019-01-04 亚马逊技术公司 Speaker verification method and system
CN109215662A (en) * 2018-09-18 2019-01-15 平安科技(深圳)有限公司 End-to-end audio recognition method, electronic device and computer readable storage medium
CN109256135A (en) * 2018-08-28 2019-01-22 桂林电子科技大学 A kind of end-to-end method for identifying speaker, device and storage medium
CN109637545A (en) * 2019-01-17 2019-04-16 哈尔滨工程大学 Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term
CN109801635A (en) * 2019-01-31 2019-05-24 北京声智科技有限公司 A kind of vocal print feature extracting method and device based on attention mechanism

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100454942C (en) * 2004-06-25 2009-01-21 联想(北京)有限公司 Realizing system and method for multimode communication of mobile terminal
CN101848277A (en) * 2010-04-23 2010-09-29 中兴通讯股份有限公司 Mobile terminal and method for storing conversation contents in real time
CN103391347B (en) * 2012-05-10 2018-06-08 中兴通讯股份有限公司 A kind of method and device of automatic recording
CN103167371A (en) * 2013-04-09 2013-06-19 北京兴科迪科技有限公司 Bluetooth headset with record storage function and vehicle with same
CN104580647B (en) * 2014-12-31 2018-11-06 惠州Tcl移动通信有限公司 A kind of caching method and communication device of calling record
CN205961381U (en) * 2016-07-20 2017-02-15 深圳唯创知音电子有限公司 Recording earphone
US20180330718A1 (en) * 2017-05-11 2018-11-15 Mitsubishi Electric Research Laboratories, Inc. System and Method for End-to-End speech recognition
CN107580102A (en) * 2017-08-22 2018-01-12 深圳传音控股有限公司 Earphone and the method for earphone recording
CN107993663A (en) * 2017-09-11 2018-05-04 北京航空航天大学 A kind of method for recognizing sound-groove based on Android
CN109040444B (en) * 2018-07-27 2020-08-14 维沃移动通信有限公司 Call recording method, terminal and computer readable storage medium
CN109979429A (en) * 2019-05-29 2019-07-05 南京硅基智能科技有限公司 A kind of method and system of TTS

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109155132A (en) * 2016-03-21 2019-01-04 亚马逊技术公司 Speaker verification method and system
CN108269569A (en) * 2017-01-04 2018-07-10 三星电子株式会社 Audio recognition method and equipment
US20180308487A1 (en) * 2017-04-21 2018-10-25 Go-Vivace Inc. Dialogue System Incorporating Unique Speech to Text Conversion Method for Meaningful Dialogue Response
US20180374486A1 (en) * 2017-06-23 2018-12-27 Microsoft Technology Licensing, Llc Speaker recognition
CN109256135A (en) * 2018-08-28 2019-01-22 桂林电子科技大学 A kind of end-to-end method for identifying speaker, device and storage medium
CN109215662A (en) * 2018-09-18 2019-01-15 平安科技(深圳)有限公司 End-to-end audio recognition method, electronic device and computer readable storage medium
CN109637545A (en) * 2019-01-17 2019-04-16 哈尔滨工程大学 Based on one-dimensional convolution asymmetric double to the method for recognizing sound-groove of long memory network in short-term
CN109801635A (en) * 2019-01-31 2019-05-24 北京声智科技有限公司 A kind of vocal print feature extracting method and device based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
End-to-End Attention based Text-Dependent Speaker Verification;Shi-Xiong Zhang et al.;《Spoken Language Technology Workshop》;20170209;第171-178页 *

Also Published As

Publication number Publication date
CN110556114A (en) 2019-12-10

Similar Documents

Publication Publication Date Title
CN110136727B (en) Speaker identification method, device and storage medium based on speaking content
CN107274916B (en) Method and device for operating audio/video file based on voiceprint information
CN111128223B (en) Text information-based auxiliary speaker separation method and related device
CN108877823B (en) Speech enhancement method and device
CN104766608A (en) Voice control method and voice control device
CN104485102A (en) Voiceprint recognition method and device
CN108010513B (en) Voice processing method and device
CN113488024B (en) Telephone interrupt recognition method and system based on semantic recognition
CN113823293B (en) Speaker recognition method and system based on voice enhancement
CN111145763A (en) GRU-based voice recognition method and system in audio
CN108848507A (en) A kind of bad telecommunication user information collecting method
CN109829691B (en) C/S card punching method and device based on position and deep learning multiple biological features
CN110556114B (en) Speaker identification method and device based on attention mechanism
CN110517697A (en) Prompt tone intelligence cutting-off device for interactive voice response
CN107705791A (en) Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition
CN105679323B (en) A kind of number discovery method and system
CN113744742B (en) Role identification method, device and system under dialogue scene
CN110600032A (en) Voice recognition method and device
CN111627448A (en) System and method for realizing trial and talk control based on voice big data
CN108665901B (en) Phoneme/syllable extraction method and device
CN109273012B (en) Identity authentication method based on speaker recognition and digital voice recognition
CN109817223A (en) Phoneme marking method and device based on audio fingerprints
CN113409774A (en) Voice recognition method and device and electronic equipment
Zou et al. Automatic cell phone recognition from speech recordings
CN111986680A (en) Method and device for evaluating spoken language of object, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220617