CN112614493B - Voiceprint recognition method, system, storage medium and electronic device - Google Patents

Voiceprint recognition method, system, storage medium and electronic device Download PDF

Info

Publication number
CN112614493B
CN112614493B CN202011409154.8A CN202011409154A CN112614493B CN 112614493 B CN112614493 B CN 112614493B CN 202011409154 A CN202011409154 A CN 202011409154A CN 112614493 B CN112614493 B CN 112614493B
Authority
CN
China
Prior art keywords
neural network
voiceprint
convolutional neural
layer
convolutional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011409154.8A
Other languages
Chinese (zh)
Other versions
CN112614493A (en
Inventor
张鹏
吴伟
李明杰
詹培旋
王彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Lianyun Technology Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202011409154.8A priority Critical patent/CN112614493B/en
Publication of CN112614493A publication Critical patent/CN112614493A/en
Application granted granted Critical
Publication of CN112614493B publication Critical patent/CN112614493B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Image Analysis (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The application relates to the technical field of voiceprint recognition, in particular to a voiceprint recognition method, a voiceprint recognition system, a storage medium and electronic equipment, and solves the problem that in the related art, due to the fact that square convolution of a fixed receptive field is adopted, the final voiceprint recognition effect is poor. The method comprises the following steps: extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model is obtained by training a convolutional neural network comprising a deformable convolutional layer; and comparing the similarity of the voiceprint features to be verified and the registered voiceprint features, judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change of different voiceprint features is realized, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.

Description

Voiceprint recognition method, system, storage medium and electronic device
Technical Field
The present application relates to the field of voiceprint recognition technologies, and in particular, to a voiceprint recognition method, a voiceprint recognition system, a storage medium, and an electronic device.
Background
Voiceprint recognition is a technology for identity authentication based on voice, and belongs to one of biological feature recognition. The application field of voiceprint recognition is very wide, and the voiceprint recognition can be continuously popularized and popularized along with the development of intelligent voice technology. In recent years, application of deep learning begins to become a hot spot in the field of voiceprint recognition, and a voiceprint recognition system modeled by a deep convolutional neural network shows great recognition performance improvement due to the fact that a large amount of labeled audio data benefit.
In the existing voiceprint recognition method adopting deep convolutional neural network modeling, the convolutional kernel of the convolutional neural network can perform convolution operation on a local region of an input feature, the traditional square convolution is adopted, only voiceprint features in a fixed square region can be sampled, adaptive receptive field change cannot be performed on different voiceprint features, and the final voiceprint recognition effect is poor.
Disclosure of Invention
In view of the above problems, the present application provides a voiceprint recognition method, system, storage medium, and electronic device, which solve the technical problem in the related art that a final voiceprint recognition effect is poor due to the square convolution with a fixed receptive field.
In a first aspect, the present application provides a voiceprint recognition method, including:
receiving voice information;
extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint.
Optionally, the comparing the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance to obtain a similarity result includes:
and calculating the similarity between the voiceprint features to be verified and the registered voiceprint features which are registered in advance by a cosine calculation method to obtain a similarity result.
Optionally, the process of registering the voiceprint feature includes:
receiving registration voice information;
extracting registration voiceprint characteristics in the registration voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer.
Optionally, the training process of the convolutional neural network model includes:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
Optionally, the training voiceprint feature is a mel-frequency cepstrum coefficient feature.
Optionally, the deformable convolution layer is configured to add an offset parameter to each element of the convolution kernel to obtain the adaptive receptive field.
Optionally, the first pooling layer and the second pooling layer are used to reduce feature size, enlarge the receptive field, and/or reduce the computational effort.
In a second aspect, a voiceprint recognition system, the system comprising:
a receiving unit for receiving voice information;
the extracting unit is used for extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
the comparison unit is used for comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
and the verification unit is used for judging whether the similarity result is greater than a preset threshold value or not, and if the similarity result is greater than the preset threshold value, the voiceprint recognition is successful.
In a third aspect, a storage medium storing a computer program executable by one or more processors may be used to implement the voiceprint recognition method as described in the first aspect above.
In a fourth aspect, an electronic device comprises a memory and a processor, the memory having a computer program stored thereon, the memory and the processor being communicatively connected to each other, the computer program, when executed by the processor, performing the voiceprint recognition method as described in the first aspect above.
The application provides a voiceprint recognition method, a system, a storage medium and an electronic device, comprising: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. According to the method and the device, the voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change is carried out on different voiceprint features, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flowchart of a voiceprint recognition method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a convolutional neural network provided in an embodiment of the present application;
fig. 3 is a schematic structural diagram of a voiceprint recognition system according to an embodiment of the present application;
fig. 4 is a connection block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following detailed description will be provided with reference to the accompanying drawings and embodiments, so that how to apply the technical means to solve the technical problems and achieve the corresponding technical effects can be fully understood and implemented. The embodiments and various features in the embodiments of the present application can be combined with each other without conflict, and the formed technical solutions are all within the scope of protection of the present application.
It can be known from the background art that in the existing voiceprint recognition method for modeling by using a deep convolutional neural network, the convolution kernel of the convolutional neural network performs convolution operation on a local region of an input feature, the traditional square convolution is adopted, the voiceprint feature in a fixed square region can only be sampled, and the adaptive receptive field change cannot be performed on different voiceprint features, so that the final voiceprint recognition effect is poor.
In view of this, the present application provides a voiceprint recognition method, system, storage medium and electronic device, which solve the technical problem in the related art that the final voiceprint recognition effect is poor due to the square convolution with a fixed receptive field.
Example one
Fig. 1 is a schematic flow chart of a voiceprint recognition method provided in an embodiment of the present application, and as shown in fig. 1, the method includes:
s101, receiving voice information;
s102, extracting voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model;
in step S102, the pre-trained convolutional neural network model is trained by a convolutional neural network including a deformable convolutional layer.
S103, comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
and S104, judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint.
It should be noted that, because the set of people needing to be verified by the voiceprint recognition system is not fixed, it is not practical to retrain the convolutional neural network model once by adding one person, and therefore, the convolutional neural network model plays a role in feature extraction in the whole method and does not serve as a classifier or a recognizer.
Optionally, the comparing the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance to obtain a similarity result includes:
and calculating the similarity between the voiceprint features to be verified and the registered voiceprint features which are registered in advance by a cosine calculation method to obtain a similarity result.
It should be noted that, the present invention includes, but is not limited to, calculating the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance by using a cosine calculation method, and other calculation methods may also be used as needed, as long as the similarity between the voiceprint feature to be verified and the registered voiceprint feature that is registered in advance is finally obtained.
Optionally, the process of registering the voiceprint feature includes:
receiving registration voice information;
extracting registration voiceprint characteristics in the registration voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer.
It should be noted that, in order to solve the problem of increasing the number of people to be identified, a pre-registration method may be adopted to register the voiceprint features of the increased people for similarity comparison in subsequent voiceprint identification.
Optionally, the training process of the convolutional neural network model includes:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
It should be noted that the first convolution layer is used for performing preliminary extraction on the input training voiceprint features to obtain the intermediate layer features, which is convenient for subsequent continuous processing of the deformable convolution layer and the like.
Specifically, as shown in fig. 2, for the schematic structural diagram of the convolutional neural network provided in the embodiment of the present application, after inputting training voiceprint features into the first convolutional layer, feature extraction is performed to obtain intermediate layer features, and then the intermediate layer features are output through the full connection layer after being sequentially processed by each layer.
Optionally, the training voiceprint feature is a mel-frequency cepstrum coefficient feature.
It should be noted that Mel-scale Frequency Cepstral Coefficients (MFCCs) are Cepstral parameters extracted in the Mel-scale Frequency domain, and the Mel-scale describes the non-linear characteristic of human ear Frequency. Since the characteristics do not depend on the properties of the signals, no assumptions and restrictions are made on the input signals, and the research results of the auditory model are utilized. Therefore, the parameter has better robustness than the LPCC based on the vocal tract model, better conforms to the auditory characteristics of human ears, and still has better recognition performance when the signal-to-noise ratio is reduced.
Optionally, the deformable convolution layer is configured to add an offset parameter to each element of the convolution kernel to obtain the adaptive receptive field.
It should be noted that after the training voiceprint features are input into the first convolution layer, feature extraction is performed to obtain intermediate layer features, and meanwhile, specific offset of each element of the convolution kernel can be obtained through learning, when a training process comes to the deformable convolution layer, an offset parameter is added to the specific offset of each element of the convolution kernel, and the offset parameter can enable the receptive field of the sampling network to be adaptively adjusted according to the shape of the target to be measured, so that the most accurate features are obtained.
Optionally, the first pooling layer and the second pooling layer are used to reduce feature size, enlarge the receptive field, and/or reduce the computational effort.
In summary, an embodiment of the present application provides a voiceprint recognition method, including: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolution neural network model which is trained in advance is obtained by training a convolution neural network comprising a deformable convolution layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change is carried out on different voiceprint features, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
Example two
Based on the voiceprint recognition method disclosed in the above embodiment of the present invention, fig. 3 specifically discloses a voiceprint recognition system applying the voiceprint recognition method.
As shown in fig. 3, the embodiment of the present invention discloses a voiceprint recognition system, which includes:
a receiving unit 301, configured to receive voice information;
an extracting unit 302, configured to extract a voiceprint feature to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
a comparing unit 303, configured to compare similarity between the voiceprint feature to be verified and a registered voiceprint feature that is registered in advance, to obtain a similarity result;
the verification unit 304 is configured to determine whether the similarity result is greater than a preset threshold, and if the similarity result is greater than the preset threshold, the voiceprint recognition is successful.
For the specific working processes of the receiving unit 301, the extracting unit 302, the comparing unit 303 and the verifying unit 304 in the voiceprint recognition system disclosed in the embodiment of the present invention, reference may be made to the corresponding contents in the voiceprint recognition method disclosed in the above embodiment of the present invention, and details are not repeated here.
In summary, an embodiment of the present application provides a voiceprint recognition system, including: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change of different voiceprint features is realized, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
EXAMPLE III
The present embodiment further provides a computer-readable storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, a server, an App application mall, etc., on which a computer program is stored, where the computer program can implement the method steps of the first embodiment when executed by a processor, and the detailed description of the embodiment is not repeated herein.
Example four
Fig. 4 is a connection block diagram of an electronic device 500 according to an embodiment of the present application, and as shown in fig. 4, the electronic device 500 may include: a processor 501, a memory 502, a multimedia component 503, an input/output (I/O) interface 504, and a communication component 505.
The processor 501 is configured to execute all or part of the steps in the voiceprint recognition method according to the first embodiment. The memory 502 is used to store various types of data, which may include, for example, instructions for any application or method in the electronic device, as well as application-related data.
The Processor 501 may be implemented by an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to perform the voiceprint recognition method in the first embodiment.
The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.
The multimedia component 503 may include a screen, which may be a touch screen, and an audio component for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in a memory or transmitted through a communication component. The audio assembly also includes at least one speaker for outputting audio signals.
The I/O interface 504 provides an interface between the processor 501 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons.
The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding Communication component 505 may include: wi-Fi module, bluetooth module, NFC module.
In summary, the present application provides a voiceprint recognition method, a system, a storage medium, and an electronic device, where the method includes: receiving voice information; extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer; comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result; and judging whether the similarity result is greater than a preset threshold value, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint. The voiceprint features are extracted through the convolutional neural network added with the deformable convolutional layer, so that adaptive receptive field change is carried out on different voiceprint features, the finally obtained convolutional neural network model has higher robustness, and the voiceprint recognition precision is improved.
In the embodiments provided in the present application, it should be understood that the disclosed method can be implemented in other ways. The above-described method embodiments are merely illustrative.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
Although the embodiments disclosed in the present application are described above, the above descriptions are only for the convenience of understanding the present application, and are not intended to limit the present application. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims (9)

1. A voiceprint recognition method, the method comprising:
receiving voice information;
extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
judging whether the similarity result is greater than a preset threshold value or not, and if the similarity result is greater than the preset threshold value, successfully identifying the voiceprint;
the training process of the convolutional neural network model comprises the following steps:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
2. The method according to claim 1, wherein the comparing the similarity between the voiceprint feature to be verified and the registered voiceprint feature that has been registered in advance to obtain a similarity result comprises:
and calculating the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance by a cosine calculation method to obtain a similarity result.
3. The method of claim 1, wherein the registration process for registering the voiceprint feature comprises:
receiving registration voice information;
extracting registration voiceprint characteristics in the registration voice information through a pre-trained convolutional neural network model; the pre-trained convolutional neural network model is obtained by training a convolutional neural network comprising a deformable convolutional layer.
4. The method of claim 1, wherein the training voiceprint features are mel-frequency cepstral coefficient features.
5. The method of claim 1, wherein the deformable convolution layer is configured to add an offset parameter to each element of the convolution kernel to obtain an adaptive receptive field.
6. The method of claim 1, wherein the first and second pooling layers are used to reduce feature size, enlarge the receptive field, and/or reduce computational effort.
7. A voiceprint recognition system, said system comprising:
a receiving unit for receiving voice information;
the extracting unit is used for extracting the voiceprint features to be verified in the voice information through a pre-trained convolutional neural network model; the convolutional neural network model which is trained in advance is obtained by training a convolutional neural network comprising a deformable convolutional layer;
the comparison unit is used for comparing the similarity of the voiceprint features to be verified and the registered voiceprint features which are registered in advance to obtain a similarity result;
the verification unit is used for judging whether the similarity result is greater than a preset threshold value or not, and if the similarity result is greater than the preset threshold value, the voiceprint recognition is successful;
the training process of the convolutional neural network model comprises the following steps:
establishing a convolutional neural network; the convolutional neural network comprises a first convolutional layer, a first pooling layer, a deformable convolutional layer, a second pooling layer, a second convolutional layer and a full-connection layer which are sequentially arranged; the first convolutional layer comprises a first sub-convolutional layer and a second sub-convolutional layer, the deformable convolutional layer comprises a first sub-deformable convolutional layer and a second sub-deformable convolutional layer;
and training the convolutional neural network by taking the training voiceprint features which are marked in advance as input to obtain the convolutional neural network model.
8. A storage medium storing a computer program executable by one or more processors for implementing a voiceprint recognition method as claimed in any one of claims 1 to 6.
9. An electronic device, comprising a memory and a processor, wherein the memory has a computer program stored thereon, and the memory and the processor are communicatively connected to each other, and wherein the computer program, when executed by the processor, performs the voiceprint recognition method according to any one of claims 1 to 6.
CN202011409154.8A 2020-12-04 2020-12-04 Voiceprint recognition method, system, storage medium and electronic device Active CN112614493B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011409154.8A CN112614493B (en) 2020-12-04 2020-12-04 Voiceprint recognition method, system, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011409154.8A CN112614493B (en) 2020-12-04 2020-12-04 Voiceprint recognition method, system, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN112614493A CN112614493A (en) 2021-04-06
CN112614493B true CN112614493B (en) 2022-11-11

Family

ID=75228922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011409154.8A Active CN112614493B (en) 2020-12-04 2020-12-04 Voiceprint recognition method, system, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN112614493B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327604A (en) * 2021-07-02 2021-08-31 因诺微科技(天津)有限公司 Ultrashort speech language identification method
CN114093370B (en) * 2022-01-19 2022-04-29 珠海市杰理科技股份有限公司 Voiceprint recognition method and device, computer equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107610709B (en) * 2017-08-01 2021-03-19 百度在线网络技术(北京)有限公司 Method and system for training voiceprint recognition model
CN108564025A (en) * 2018-04-10 2018-09-21 广东电网有限责任公司 A kind of infrared image object identification method based on deformable convolutional neural networks
CN108766445A (en) * 2018-05-30 2018-11-06 苏州思必驰信息科技有限公司 Method for recognizing sound-groove and system
CN110047490A (en) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 Method for recognizing sound-groove, device, equipment and computer readable storage medium
CN111368684A (en) * 2020-02-27 2020-07-03 北华航天工业学院 Winter wheat automatic interpretation method based on deformable full-convolution neural network

Also Published As

Publication number Publication date
CN112614493A (en) 2021-04-06

Similar Documents

Publication Publication Date Title
US9502038B2 (en) Method and device for voiceprint recognition
JP6859522B2 (en) Methods, devices, and systems for building user voiceprint models
CN110310623B (en) Sample generation method, model training method, device, medium, and electronic apparatus
TWI527023B (en) A voiceprint recognition method and apparatus
CN109801634B (en) Voiceprint feature fusion method and device
KR102413282B1 (en) Method for performing personalized speech recognition and user terminal and server performing the same
CN112614493B (en) Voiceprint recognition method, system, storage medium and electronic device
US10909991B2 (en) System for text-dependent speaker recognition and method thereof
CN109473105A (en) The voice print verification method, apparatus unrelated with text and computer equipment
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
CN110265035B (en) Speaker recognition method based on deep learning
WO2019232826A1 (en) I-vector extraction method, speaker recognition method and apparatus, device, and medium
CN113129867B (en) Training method of voice recognition model, voice recognition method, device and equipment
CN110634492B (en) Login verification method, login verification device, electronic equipment and computer readable storage medium
CN111161713A (en) Voice gender identification method and device and computing equipment
TW202213326A (en) Generalized negative log-likelihood loss for speaker verification
Khdier et al. Deep learning algorithms based voiceprint recognition system in noisy environment
CN108776795A (en) Method for identifying ID, device and terminal device
CN109545226B (en) Voice recognition method, device and computer readable storage medium
CN112347788A (en) Corpus processing method, apparatus and storage medium
CN113421573B (en) Identity recognition model training method, identity recognition method and device
CN111199742A (en) Identity verification method and device and computing equipment
CN105575385A (en) Voice cipher setting system and method, and sound cipher verification system and method
CN112992155B (en) Far-field voice speaker recognition method and device based on residual error neural network
CN113948089B (en) Voiceprint model training and voiceprint recognition methods, devices, equipment and media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant